Skip to main content

Overview

ElevenLabs provides high-quality text-to-speech synthesis with two service implementations:
  • ElevenLabsTTSService (WebSocket) — Real-time streaming with word-level timestamps, audio context management, and interruption handling. Recommended for interactive applications.
  • ElevenLabsHttpTTSService (HTTP) — Simpler batch-style synthesis. Suitable for non-interactive use cases or when WebSocket connections are not possible.

Installation

pip install "pipecat-ai[elevenlabs]"

Prerequisites

  1. ElevenLabs Account: Sign up at ElevenLabs
  2. API Key: Generate an API key from your account dashboard
  3. Voice Selection: Choose voice IDs from the voice library
Set the following environment variable:
export ELEVENLABS_API_KEY=your_api_key

Configuration

ElevenLabsTTSService

api_key
str
required
ElevenLabs API key.
voice_id
str
required
Voice ID from the voice library.
model
str
default:"eleven_turbo_v2_5"
ElevenLabs model ID. Use a multilingual model variant (e.g. eleven_multilingual_v2) if you need non-English language support.
url
str
default:"wss://api.elevenlabs.io"
WebSocket endpoint URL. Override for custom or proxied deployments.
sample_rate
int
default:"None"
Output audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
aggregate_sentences
bool
default:"True"
Buffer text until sentence boundaries before sending to ElevenLabs. Produces more natural-sounding speech at the cost of a small latency increase (~15ms) for the first word of each sentence.
params
InputParams
default:"None"
Runtime-configurable voice and generation settings. See InputParams below.

ElevenLabsHttpTTSService

The HTTP service accepts the same parameters as the WebSocket service, with these differences:
aiohttp_session
aiohttp.ClientSession
required
An aiohttp session for HTTP requests. You must create and manage this yourself.
base_url
str
default:"https://api.elevenlabs.io"
HTTP API base URL (instead of url for WebSocket).
The HTTP InputParams also includes:
optimize_streaming_latency
int
default:"None"
Latency optimization level (0–4). Higher values reduce latency at the cost of quality.

InputParams

Voice and generation settings that can be set at initialization via the params constructor argument, or changed at runtime via UpdateSettingsFrame.
ParameterTypeDefaultDescription
languageLanguageNoneLanguage code. Only effective with multilingual models.
stabilityfloatNoneVoice consistency (0.0–1.0). Lower values are more expressive, higher values are more consistent.
similarity_boostfloatNoneVoice clarity and similarity to the original (0.0–1.0).
stylefloatNoneStyle exaggeration (0.0–1.0). Higher values amplify the voice’s style.
use_speaker_boostboolNoneEnhance clarity and target speaker similarity.
speedfloatNoneSpeech rate. WebSocket: 0.7–1.2. HTTP: 0.25–4.0.
auto_modeboolTrueAutomatic optimization mode. WebSocket only.
enable_ssml_parsingboolNoneParse SSML tags in input text. WebSocket only.
apply_text_normalizationLiteralNoneText normalization: "auto", "on", or "off".
None values use the ElevenLabs API defaults. See ElevenLabs voice settings for details on how these parameters interact.

Usage

Basic Setup

from pipecat.services.elevenlabs import ElevenLabsTTSService

tts = ElevenLabsTTSService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    voice_id="21m00Tcm4TlvDq8ikWAM",  # Rachel
)

With Voice Customization

tts = ElevenLabsTTSService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    voice_id="21m00Tcm4TlvDq8ikWAM",
    model="eleven_multilingual_v2",
    params=ElevenLabsTTSService.InputParams(
        language=Language.ES,
        stability=0.7,
        similarity_boost=0.8,
        speed=1.1,
    ),
)

Updating Settings at Runtime

Voice settings can be changed mid-conversation using UpdateSettingsFrame:
from pipecat.frames.frames import UpdateSettingsFrame

await task.queue_frame(
    UpdateSettingsFrame(
        settings={
            "tts": {
                "stability": 0.3,
                "speed": 1.1,
            }
        }
    )
)

HTTP Service

import aiohttp
from pipecat.services.elevenlabs import ElevenLabsHttpTTSService

async with aiohttp.ClientSession() as session:
    tts = ElevenLabsHttpTTSService(
        api_key=os.getenv("ELEVENLABS_API_KEY"),
        voice_id="21m00Tcm4TlvDq8ikWAM",
        aiohttp_session=session,
    )

Notes

  • Multilingual models required for language: Setting language with a non-multilingual model (e.g. eleven_turbo_v2_5) has no effect. Use eleven_multilingual_v2 or similar.
  • WebSocket vs HTTP: The WebSocket service supports word-level timestamps and interruption handling, making it significantly better for interactive conversations. The HTTP service is simpler but lacks these features.
  • Sentence aggregation: Enabled by default. Buffering until sentence boundaries produces more natural speech with minimal latency impact. Disable with aggregate_sentences=False if you need word-by-word streaming.

Event Handlers

ElevenLabs TTS supports the standard service connection events:
EventDescription
on_connectedConnected to ElevenLabs WebSocket
on_disconnectedDisconnected from ElevenLabs WebSocket
on_connection_errorWebSocket connection error occurred
@tts.event_handler("on_connected")
async def on_connected(service):
    print("Connected to ElevenLabs")