Text-to-speech service using OpenAI’s TTS API
OPENAI_API_KEY
.
TextFrame
- Text content to synthesize into speechTTSSpeakFrame
- Text that should be spoken immediatelyTTSUpdateSettingsFrame
- Runtime configuration updatesLLMFullResponseStartFrame
/ LLMFullResponseEndFrame
- LLM response boundariesTTSStartedFrame
- Signals start of synthesisTTSAudioRawFrame
- Generated audio data (24kHz PCM, mono)TTSStoppedFrame
- Signals completion of synthesisErrorFrame
- API or processing errorsModel | Description | Best For |
---|---|---|
gpt-4o-mini-tts | Latest GPT-based TTS model | Faster generation, improved prosody, recommended for most use cases |
tts-1 | Original TTS model | Standard quality speech synthesis |
tts-1-hd | High-definition TTS model | Premium quality speech with higher fidelity |
Voice | Description | Characteristics |
---|---|---|
alloy | Balanced, neutral | Professional, clear |
echo | Calm, measured | Thoughtful, deliberate |
fable | Warm, engaging | Storytelling, expressive |
onyx | Deep, authoritative | Commanding, confident |
nova | Bright, energetic | Enthusiastic, friendly |
shimmer | Soft, gentle | Soothing, approachable |
ash | Mature, sophisticated | Experienced, wise |
ballad | Smooth, melodic | Musical, flowing |
coral | Vibrant, lively | Dynamic, spirited |
sage | Wise, contemplative | Reflective, knowledgeable |
verse | Poetic, rhythmic | Artistic, expressive |
OpenAITTSService
and use it in a pipeline: