OpenAI

Overview

OpenAI’s TTS API provides high-quality text-to-speech synthesis with multiple voice models including traditional TTS models and advanced GPT-based models. The service outputs 24kHz PCM audio with streaming capabilities for real-time applications.

API Reference

Complete API documentation and method details

OpenAI TTS Docs

Official OpenAI text-to-speech API documentation

Example Code

Working example with voice customization

Installation

To use OpenAI services, install the required dependencies:

pip install "pipecat-ai[openai]"

You’ll also need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY.

Get your API key from the OpenAI Platform.

Frames

Input

TextFrame - Text content to synthesize into speech
TTSSpeakFrame - Text that should be spoken immediately
TTSUpdateSettingsFrame - Runtime configuration updates
LLMFullResponseStartFrame / LLMFullResponseEndFrame - LLM response boundaries

Output

TTSStartedFrame - Signals start of synthesis
TTSAudioRawFrame - Generated audio data (24kHz PCM, mono)
TTSStoppedFrame - Signals completion of synthesis
ErrorFrame - API or processing errors

Models

Model	Description	Best For
`gpt-4o-mini-tts`	Latest GPT-based TTS model	Faster generation, improved prosody, recommended for most use cases
`tts-1`	Original TTS model	Standard quality speech synthesis
`tts-1-hd`	High-definition TTS model	Premium quality speech with higher fidelity

Voice Options

OpenAI provides multiple voice personalities:

Voice	Description	Characteristics
`alloy`	Balanced, neutral	Professional, clear
`echo`	Calm, measured	Thoughtful, deliberate
`fable`	Warm, engaging	Storytelling, expressive
`onyx`	Deep, authoritative	Commanding, confident
`nova`	Bright, energetic	Enthusiastic, friendly
`shimmer`	Soft, gentle	Soothing, approachable
`ash`	Mature, sophisticated	Experienced, wise
`ballad`	Smooth, melodic	Musical, flowing
`coral`	Vibrant, lively	Dynamic, spirited
`sage`	Wise, contemplative	Reflective, knowledgeable
`verse`	Poetic, rhythmic	Artistic, expressive

Refer to the OpenAI TTS documentation for the latest voice options and their characteristics.

Usage Example

Basic Configuration

Initialize OpenAITTSService and use it in a pipeline:

from pipecat.services.openai.tts import OpenAITTSService
import os

# Configure service with default settings
tts = OpenAITTSService(
    api_key=os.getenv("OPENAI_API_KEY"),
    voice="nova",
    model="gpt-4o-mini-tts"
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

Dynamic Voice Changes

await task.queue_frame(
    TTSUpdateSettingsFrame(settings={"voice": "your-new-voice-id"})
)

Audio Specifications

Sample Rate

Fixed Rate: 24kHz (24,000 Hz)
Format: 16-bit PCM
Channels: Mono (1 channel)
Streaming: Chunked delivery for low latency

OpenAI TTS only outputs at 24kHz. Ensure your pipeline sample rate matches to avoid audio issues.

Advanced Features

Voice Instructions (GPT Models)

# Guide voice behavior with instructions
tts = OpenAITTSService(
    model="gpt-4o-mini-tts",
    voice="nova",
    instructions="Speak enthusiastically about technology topics, but use a calm tone for sensitive subjects"
)

Custom Endpoints

# Use custom OpenAI-compatible endpoints
tts = OpenAITTSService(
    base_url="https://api.custom-provider.com/v1",
    api_key="custom-api-key",
    model="custom-tts-model"
)

Metrics

The service provides comprehensive metrics:

Time to First Byte (TTFB) - Latency from text input to first audio
Processing Duration - Total synthesis time
Character Usage - Text processed for billing

Learn how to enable Metrics in your Pipeline.

Additional Notes

Sample Rate Constraint: OpenAI TTS always outputs at 24kHz - ensure pipeline compatibility
Streaming Optimized: Audio chunks delivered as generated for low-latency playback
Voice Quality: GPT-based models offer superior prosody and naturalness
Instructions Support: GPT models accept behavioral instructions for voice customization
Error Handling: Robust error handling with detailed error messages
Thread Safety: Safe for concurrent use in multi-threaded applications
Cost Efficiency: Character-based billing with usage metrics tracking

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

API Reference

OpenAI TTS Docs

Example Code

Installation

Frames

Input

Output

Models

Voice Options

Usage Example

Basic Configuration

Dynamic Voice Changes

Audio Specifications

Sample Rate

Advanced Features

Voice Instructions (GPT Models)

Custom Endpoints

Metrics

Additional Notes

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

API Reference

OpenAI TTS Docs

Example Code

​Installation

​Frames

​Input

​Output

​Models

​Voice Options

​Usage Example

​Basic Configuration

​Dynamic Voice Changes

​Audio Specifications

​Sample Rate

​Advanced Features

​Voice Instructions (GPT Models)

​Custom Endpoints

​Metrics

​Additional Notes

Overview

Installation

Frames

Input

Output

Models

Voice Options

Usage Example

Basic Configuration

Dynamic Voice Changes

Audio Specifications

Sample Rate

Advanced Features

Voice Instructions (GPT Models)

Custom Endpoints

Metrics

Additional Notes