Skip to main content

Overview

OpenAITTSService provides high-quality text-to-speech synthesis using OpenAI’s TTS API with multiple voice models including traditional TTS models and advanced GPT-based models. The service outputs 24kHz PCM audio with streaming capabilities for real-time applications.

Installation

To use OpenAI services, install the required dependencies:
pip install "pipecat-ai[openai]"

Prerequisites

OpenAI Account Setup

Before using OpenAI TTS services, you need:
  1. OpenAI Account: Sign up at OpenAI Platform
  2. API Key: Generate an API key from your API keys page
  3. Voice Selection: Choose from available voice options (alloy, ash, ballad, cedar, coral, echo, fable, marin, nova, onyx, sage, shimmer, verse)

Required Environment Variables

  • OPENAI_API_KEY: Your OpenAI API key for authentication

Configuration

OpenAITTSService

api_key
str
default:"None"
OpenAI API key for authentication. If None, uses the OPENAI_API_KEY environment variable.
base_url
str
default:"None"
Custom base URL for OpenAI API. If None, uses the default OpenAI endpoint.
voice
str
default:"alloy"
Voice ID to use for synthesis. Options: alloy, ash, ballad, cedar, coral, echo, fable, marin, nova, onyx, sage, shimmer, verse.
model
str
default:"gpt-4o-mini-tts"
TTS model to use.
sample_rate
int
default:"None"
Output audio sample rate in Hz. If None, uses OpenAI’s default 24kHz. OpenAI TTS only supports 24kHz output.
params
InputParams
default:"None"
Runtime-configurable voice and generation settings. See InputParams below.

InputParams

Voice and generation settings that can be set at initialization via the params constructor argument, or changed at runtime via UpdateSettingsFrame.
ParameterTypeDefaultDescription
instructionsstrNoneInstructions to guide voice synthesis behavior (e.g. affect, tone, pacing).
speedfloatNoneVoice speed control (0.25 to 4.0).

Usage

Basic Setup

from pipecat.services.openai import OpenAITTSService

tts = OpenAITTSService(
    api_key=os.getenv("OPENAI_API_KEY"),
    voice="nova",
)

With Voice Customization

from pipecat.services.openai import OpenAITTSService

tts = OpenAITTSService(
    api_key=os.getenv("OPENAI_API_KEY"),
    voice="coral",
    model="gpt-4o-mini-tts",
    params=OpenAITTSService.InputParams(
        instructions="Speak in a warm, friendly tone with moderate pacing.",
        speed=1.1,
    ),
)

Updating Settings at Runtime

Voice settings can be changed mid-conversation using UpdateSettingsFrame:
from pipecat.frames.frames import UpdateSettingsFrame

await task.queue_frame(
    UpdateSettingsFrame(
        settings={
            "tts": {
                "instructions": "Now speak more formally.",
                "speed": 0.9,
            }
        }
    )
)

Notes

  • Fixed sample rate: OpenAI TTS always outputs audio at 24kHz. Using a different sample rate may cause issues.
  • Model selection: The gpt-4o-mini-tts model supports the instructions parameter for controlling voice affect and tone, which traditional TTS models do not support.
  • HTTP-based service: OpenAI TTS uses HTTP streaming, so it does not have WebSocket connection events.