OpenAI
Text-to-speech service using OpenAI’s TTS API
Overview
OpenAITTSService
converts text to speech using OpenAI’s TTS API. It supports multiple voices and provides high-quality audio output at 24kHz using both traditional TTS models and the gpt-4o TTS models.
Installation
To use OpenAITTSService
, install the required dependencies:
You’ll also need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY
Configuration
Constructor Parameters
OpenAI API key
Voice identifier.
Options:
"alloy"
"echo"
"fable"
"onyx"
"nova"
"shimmer"
Model to use.
Options:
"gpt-4o-mini-tts"
"tts-1"
"tts-1-hd"
Output audio sample rate in Hz. Supports only 24000
Hz.
Modifies text provided to the TTS. Learn more about the available filters.
Output Frames
Control Frames
Signals start of audio generation
Signals completion of audio generation
Audio Frames
Contains generated audio data:
- PCM encoded audio
- 24kHz sample rate
- Mono channel
Error Frames
Contains error information if TTS fails
Methods
See the TTS base class methods for additional functionality.
Models
Model | Description | Best For |
---|---|---|
gpt-4o-mini-tts | Latest GPT-based TTS model | Faster generation, improved prosody, recommended for most use cases |
tts-1 | Original TTS model | Standard quality speech |
tts-1-hd | High-definition TTS model | Premium quality speech with higher fidelity |
Language Support
OpenAI TTS supports the following languages and regional variants:
Language Code | Description | Service Codes |
---|---|---|
Language.EN | English | en |
Usage Example
Frame Flow
Metrics Support
The service supports metrics collection:
- Time to First Byte (TTFB)
- TTS usage metrics
- Processing duration
Notes
- Outputs PCM audio at 24kHz
- Streams audio in 1KB chunks
- Supports multiple voices
- Uses GPT-4o Mini TTS by default for improved quality
- Includes metrics collection
- Thread-safe processing
- Handles empty text gracefully