Overview

OpenAI’s TTS API provides high-quality text-to-speech synthesis with multiple voice models including traditional TTS models and advanced GPT-based models. The service outputs 24kHz PCM audio with streaming capabilities for real-time applications.

Installation

To use OpenAI services, install the required dependencies:
pip install "pipecat-ai[openai]"
You’ll also need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY.
Get your API key from the OpenAI Platform.

Frames

Input

  • TextFrame - Text content to synthesize into speech
  • TTSSpeakFrame - Text that should be spoken immediately
  • TTSUpdateSettingsFrame - Runtime configuration updates
  • LLMFullResponseStartFrame / LLMFullResponseEndFrame - LLM response boundaries

Output

  • TTSStartedFrame - Signals start of synthesis
  • TTSAudioRawFrame - Generated audio data (24kHz PCM, mono)
  • TTSStoppedFrame - Signals completion of synthesis
  • ErrorFrame - API or processing errors

Models

ModelDescriptionBest For
gpt-4o-mini-ttsLatest GPT-based TTS modelFaster generation, improved prosody, recommended for most use cases
tts-1Original TTS modelStandard quality speech synthesis
tts-1-hdHigh-definition TTS modelPremium quality speech with higher fidelity

Voice Options

OpenAI provides multiple voice personalities:
VoiceDescriptionCharacteristics
alloyBalanced, neutralProfessional, clear
echoCalm, measuredThoughtful, deliberate
fableWarm, engagingStorytelling, expressive
onyxDeep, authoritativeCommanding, confident
novaBright, energeticEnthusiastic, friendly
shimmerSoft, gentleSoothing, approachable
ashMature, sophisticatedExperienced, wise
balladSmooth, melodicMusical, flowing
coralVibrant, livelyDynamic, spirited
sageWise, contemplativeReflective, knowledgeable
versePoetic, rhythmicArtistic, expressive
Refer to the OpenAI TTS documentation for the latest voice options and their characteristics.

Usage Example

Basic Configuration

Initialize OpenAITTSService and use it in a pipeline:
from pipecat.services.openai.tts import OpenAITTSService
import os

# Configure service with default settings
tts = OpenAITTSService(
    api_key=os.getenv("OPENAI_API_KEY"),
    voice="nova",
    model="gpt-4o-mini-tts"
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

Dynamic Voice Changes

# Runtime voice switching via settings update
await task.queue_frame(TTSUpdateSettingsFrame({
    "voice": "sage"
}))

Audio Specifications

Sample Rate

  • Fixed Rate: 24kHz (24,000 Hz)
  • Format: 16-bit PCM
  • Channels: Mono (1 channel)
  • Streaming: Chunked delivery for low latency
OpenAI TTS only outputs at 24kHz. Ensure your pipeline sample rate matches to avoid audio issues.

Advanced Features

Voice Instructions (GPT Models)

# Guide voice behavior with instructions
tts = OpenAITTSService(
    model="gpt-4o-mini-tts",
    voice="nova",
    instructions="Speak enthusiastically about technology topics, but use a calm tone for sensitive subjects"
)

Custom Endpoints

# Use custom OpenAI-compatible endpoints
tts = OpenAITTSService(
    base_url="https://api.custom-provider.com/v1",
    api_key="custom-api-key",
    model="custom-tts-model"
)

Metrics

The service provides comprehensive metrics:
  • Time to First Byte (TTFB) - Latency from text input to first audio
  • Processing Duration - Total synthesis time
  • Character Usage - Text processed for billing
Learn how to enable Metrics in your Pipeline.

Additional Notes

  • Sample Rate Constraint: OpenAI TTS always outputs at 24kHz - ensure pipeline compatibility
  • Streaming Optimized: Audio chunks delivered as generated for low-latency playback
  • Voice Quality: GPT-based models offer superior prosody and naturalness
  • Instructions Support: GPT models accept behavioral instructions for voice customization
  • Error Handling: Robust error handling with detailed error messages
  • Thread Safety: Safe for concurrent use in multi-threaded applications
  • Cost Efficiency: Character-based billing with usage metrics tracking