DEV Community

Trollgen Studios
Trollgen Studios

Posted on

OpenAI Text To Speech (TTS) Streaming with FastAPI + Python + React Frontend

This code worked for me to get the chunks streaming in:
`await websocket.send_text('|AUDIO_START|')

            with openai_client.audio.speech.with_streaming_response.create(
                model="tts-1",
                voice="nova",
                response_format="mp3",  # Changed to mp3 format
                input=text_message,
            ) as response:
                for chunk in response.iter_bytes(chunk_size=1024):
                    await websocket.send_bytes(chunk)


            await websocket.send_text('|AUDIO_END|')`
Enter fullscreen mode Exit fullscreen mode

Then, the challenge was figuring it out on the frontend on React.

// Handle incoming messages
    useEffect(() => {
      if (lastMessage?.data === '|AUDIO_START|') {
        setAudioBuffers([]); // Clear existing buffers
        setIsAudioStreaming(true);
        chunksRef.current = [];
      } else if (lastMessage?.data === '|AUDIO_END|') {
        setIsAudioStreaming(false);
        playAudioChunks();
      } else if (isAudioStreaming && lastMessage?.data) {
        console.log('trying to parse audio chunk:');
        console.log(lastMessage.data);
        chunksRef.current.push(lastMessage.data);
        // playAudioChunk(lastMessage.data);
      }
}, []);

const playAudioChunks = async () => {
      const audioBlob = new Blob(chunksRef.current, { type: 'audio/mp3' });
      const audioUrl = URL.createObjectURL(audioBlob);
      const audio = new Audio(audioUrl);

      try {
        await audio.play();
      } catch (err) {
        console.error('Error playing audio:', err);
      }

      // Clean up the URL after audio is done playing
      audio.onended = () => {
        URL.revokeObjectURL(audioUrl);
      };
    };
Enter fullscreen mode Exit fullscreen mode

Sorry for the bad formatting, but hope this helps if you are struggling.

I am trying to figure out how to play chunk by chunk the best way. If I play each chunk as it comes in, the audio sounds weird so need to look into modifying this code (a larger chunk size? a delay?). Feel free to comment if any insights to share with the community.

const playAudioChunk = async (chunk) => {
      const audioBlob = new Blob([chunk], { type: 'audio/mp3' });
      const audioUrl = URL.createObjectURL(audioBlob);
      const audio = new Audio(audioUrl);

      try {
        await audio.play();
      } catch (err) {
        console.error('Error playing audio:', err);
      }

      // Clean up the URL after audio is done playing
      audio.onended = () => {
        URL.revokeObjectURL(audioUrl);
      };
    };

Enter fullscreen mode Exit fullscreen mode

Top comments (0)