Learn how to configure speech synthesis to convert text into natural-sounding audio in your voice AI pipeline
LLMTextFrame
s from language model responsesLLMTextFrame
s:
TTSSpeakFrame
s:
TTSAudioRawFrame
s: Raw audio data for playbackTTSTextFrame
s: Text that was actually spoken (for context updates)TTSStartedFrame
/TTSStoppedFrame
: Speech boundary markersaudio_out_sample_rate
to match your TTS service’s requirements for
optimal quality. This is preferred to setting the sample_rate directly in the
TTS service as the PipelineParam ensures that all output sample_rates match.TTSSpeakFrame
for immediate speech synthesis: