Overview
OpenAITTSService provides high-quality text-to-speech synthesis using OpenAI’s TTS API with multiple voice models including traditional TTS models and advanced GPT-based models. The service outputs 24kHz PCM audio with streaming capabilities for real-time applications.
OpenAI TTS API Reference
Pipecat’s API methods for OpenAI TTS integration
Example Implementation
Complete example with voice customization
OpenAI Documentation
Official OpenAI TTS API documentation
Voice Samples
Listen to available voice options
Installation
To use OpenAI services, install the required dependencies:Prerequisites
OpenAI Account Setup
Before using OpenAI TTS services, you need:- OpenAI Account: Sign up at OpenAI Platform
- API Key: Generate an API key from your API keys page
- Voice Selection: Choose from available voice options (alloy, ash, ballad, cedar, coral, echo, fable, marin, nova, onyx, sage, shimmer, verse)
Required Environment Variables
OPENAI_API_KEY: Your OpenAI API key for authentication
Configuration
OpenAITTSService
OpenAI API key for authentication. If
None, uses the OPENAI_API_KEY environment variable.Custom base URL for OpenAI API. If
None, uses the default OpenAI endpoint.Voice ID to use for synthesis. Options:
alloy, ash, ballad, cedar, coral, echo, fable, marin, nova, onyx, sage, shimmer, verse.TTS model to use.
Output audio sample rate in Hz. If
None, uses OpenAI’s default 24kHz. OpenAI TTS only supports 24kHz output.Runtime-configurable voice and generation settings. See InputParams below.
InputParams
Voice and generation settings that can be set at initialization via theparams constructor argument, or changed at runtime via UpdateSettingsFrame.
| Parameter | Type | Default | Description |
|---|---|---|---|
instructions | str | None | Instructions to guide voice synthesis behavior (e.g. affect, tone, pacing). |
speed | float | None | Voice speed control (0.25 to 4.0). |
Usage
Basic Setup
With Voice Customization
Updating Settings at Runtime
Voice settings can be changed mid-conversation usingUpdateSettingsFrame:
Notes
- Fixed sample rate: OpenAI TTS always outputs audio at 24kHz. Using a different sample rate may cause issues.
- Model selection: The
gpt-4o-mini-ttsmodel supports theinstructionsparameter for controlling voice affect and tone, which traditional TTS models do not support. - HTTP-based service: OpenAI TTS uses HTTP streaming, so it does not have WebSocket connection events.