Overview

OpenAITTSService converts text to speech using OpenAI’s TTS API. It supports multiple voices and provides high-quality audio output at 24kHz using both traditional TTS models and the gpt-4o TTS models.

Installation

To use OpenAITTSService, install the required dependencies:

pip install "pipecat-ai[openai]"

You’ll also need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY

Configuration

Constructor Parameters

api_key
str | None

OpenAI API key

voice
str
default:"alloy"

Voice identifier.

Options:

  • "alloy"
  • "echo"
  • "fable"
  • "onyx"
  • "nova"
  • "shimmer"
model
str
default:"gpt-4o-mini-tts"

Model to use.

Options:

  • "gpt-4o-mini-tts"
  • "tts-1"
  • "tts-1-hd"
sample_rate
int
default:"None"

Output audio sample rate in Hz. Supports only 24000 Hz.

text_filter
BaseTextFilter
default:"None"

Modifies text provided to the TTS. Learn more about the available filters.

Output Frames

Control Frames

TTSStartedFrame
Frame

Signals start of audio generation

TTSStoppedFrame
Frame

Signals completion of audio generation

Audio Frames

TTSAudioRawFrame
Frame

Contains generated audio data:

  • PCM encoded audio
  • 24kHz sample rate
  • Mono channel

Error Frames

ErrorFrame
Frame

Contains error information if TTS fails

Methods

See the TTS base class methods for additional functionality.

Models

ModelDescriptionBest For
gpt-4o-mini-ttsLatest GPT-based TTS modelFaster generation, improved prosody, recommended for most use cases
tts-1Original TTS modelStandard quality speech
tts-1-hdHigh-definition TTS modelPremium quality speech with higher fidelity

Language Support

OpenAI TTS supports the following languages and regional variants:

Language CodeDescriptionService Codes
Language.ENEnglishen

Usage Example

from pipecat.services.openai import OpenAITTSService

# Configure service
tts = OpenAITTSService(
    voice="nova",
    model="gpt-4o-mini-tts",
)

# Use in pipeline
pipeline = Pipeline([
    ...,
    llm,
    tts,
    transport.output(),
])

Frame Flow

Metrics Support

The service supports metrics collection:

  • Time to First Byte (TTFB)
  • TTS usage metrics
  • Processing duration

Notes

  • Outputs PCM audio at 24kHz
  • Streams audio in 1KB chunks
  • Supports multiple voices
  • Uses GPT-4o Mini TTS by default for improved quality
  • Includes metrics collection
  • Thread-safe processing
  • Handles empty text gracefully