OpenAI
Text-to-speech service using OpenAI’s TTS API
Overview
OpenAITTSService
converts text to speech using OpenAI’s TTS API. It supports multiple voices and provides high-quality audio output at 24kHz.
Installation
To use OpenAITTSService
, install the required dependencies:
You’ll also need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY
Configuration
Constructor Parameters
OpenAI API key (defaults to environment variable)
Voice identifier. Options: - “alloy” - “echo” - “fable” - “onyx” - “nova” - “shimmer”
Model to use. Options: - “tts-1” - “tts-1-hd”
Output audio sample rate in Hz
Modifies text provided to the TTS. Learn more about the available filters.
Output Frames
Control Frames
Signals start of audio generation
Signals completion of audio generation
Audio Frames
Contains generated audio data: - PCM encoded audio - 24kHz sample rate - Mono channel
Error Frames
Contains error information if TTS fails
Methods
See the TTS base class methods for additional functionality.
Language Support
OpenAI TTS supports the following languages and regional variants:
Language Code | Description | Service Codes |
---|---|---|
Language.EN | English | en |
Usage Example
Transport Configuration
When using with DailyTransport, configure matching sample rate:
Frame Flow
Metrics Support
The service supports metrics collection:
- Time to First Byte (TTFB)
- TTS usage metrics
- Processing duration
Error Handling
Notes
- Outputs PCM audio at 24kHz
- Streams audio in 8KB chunks
- Supports multiple voices
- Provides HD model option
- Includes metrics collection
- Thread-safe processing
- Handles empty text gracefully