Overview

OpenAITTSService converts text to speech using OpenAI’s TTS API. It supports multiple voices and provides high-quality audio output at 24kHz.

Installation

To use OpenAITTSService, install the required dependencies:

pip install pipecat-ai[openai]

You’ll also need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY

Configuration

Constructor Parameters

api_key
str | None

OpenAI API key (defaults to environment variable)

voice
str
default:
"alloy"

Voice identifier. Options: - “alloy” - “echo” - “fable” - “onyx” - “nova” - “shimmer”

model
str
default:
"tts-1"

Model to use. Options: - “tts-1” - “tts-1-hd”

sample_rate
int
default:
"24000"

Output audio sample rate in Hz

text_filter
BaseTextFilter
default:
"None"

Modifies text provided to the TTS. Learn more about the available filters.

Output Frames

Control Frames

TTSStartedFrame
Frame

Signals start of audio generation

TTSStoppedFrame
Frame

Signals completion of audio generation

Audio Frames

TTSAudioRawFrame
Frame

Contains generated audio data: - PCM encoded audio - 24kHz sample rate - Mono channel

Error Frames

ErrorFrame
Frame

Contains error information if TTS fails

Methods

See the TTS base class methods for additional functionality.

Language Support

OpenAI TTS supports the following languages and regional variants:

Language CodeDescriptionService Codes
Language.ENEnglishen

Usage Example

from pipecat.services.openai import OpenAITTSService

# Configure service
tts_service = OpenAITTSService(
    voice="nova",
    model="tts-1-hd",
    sample_rate=24000
)

# Use in pipeline
pipeline = Pipeline([
    ...,
    llm,
    tts,
    transport.output(),
])

Transport Configuration

When using with DailyTransport, configure matching sample rate:

DailyParams(
    audio_out_enabled=True,
    audio_out_sample_rate=24_000,
)

Frame Flow

Metrics Support

The service supports metrics collection:

  • Time to First Byte (TTFB)
  • TTS usage metrics
  • Processing duration

Error Handling

try:
    async for frame in service.run_tts(text):
        if isinstance(frame, ErrorFrame):
            logger.error(f"TTS error: {frame.error}")
            # Handle error
except Exception as e:
    logger.error(f"TTS error: {e}")

Notes

  • Outputs PCM audio at 24kHz
  • Streams audio in 8KB chunks
  • Supports multiple voices
  • Provides HD model option
  • Includes metrics collection
  • Thread-safe processing
  • Handles empty text gracefully