ElevenLabs

Overview

ElevenLabs provides high-quality text-to-speech synthesis with two implementations:

ElevenLabsTTSService: WebSocket-based with word timestamps and audio context management
ElevenLabsHttpTTSService: HTTP-based for simpler integration

ElevenLabsTTSService is recommended for real-time applications requiring precise timing.

API Reference

Complete API documentation and method details

ElevenLabs TTS Docs

Official ElevenLabs text-to-speech API documentation

Example Code

Working example with WebSocket streaming

Installation

To use ElevenLabs services, install the required dependencies:

pip install "pipecat-ai[elevenlabs]"

You’ll also need to set up your ElevenLabs API key as an environment variable: ELEVENLABS_API_KEY.

Get your API key by signing up at ElevenLabs.

Frames

Input

TextFrame - Text content to synthesize into speech
TTSSpeakFrame - Text that should be spoken immediately
TTSUpdateSettingsFrame - Runtime configuration updates
LLMFullResponseStartFrame / LLMFullResponseEndFrame - LLM response boundaries

Output

TTSStartedFrame - Signals start of synthesis
TTSAudioRawFrame - Generated audio data chunks with word timing
TTSStoppedFrame - Signals completion of synthesis
ErrorFrame - API or processing errors

Service Comparison

Feature	ElevenLabsTTSService (WebSocket)	ElevenLabsHttpTTSService (HTTP)
Word Timestamps	✅ Real-time precision	✅ Batch processing
Streaming	✅ Low-latency chunks	✅ Response streaming
Audio Context	✅ Advanced management	❌ Basic
Interruption	✅ Context-aware	⚠️ Limited
Connection	WebSocket persistent	HTTP per-request

Language Support

View All Supported Languages

Language Code	Description	Service Code
`Language.AR`	Arabic	`ar`
`Language.BG`	Bulgarian	`bg`
`Language.CS`	Czech	`cs`
`Language.DA`	Danish	`da`
`Language.DE`	German	`de`
`Language.EL`	Greek	`el`
`Language.EN`	English	`en`
`Language.ES`	Spanish	`es`
`Language.FI`	Finnish	`fi`
`Language.FIL`	Filipino	`fil`
`Language.FR`	French	`fr`
`Language.HI`	Hindi	`hi`
`Language.HR`	Croatian	`hr`
`Language.HU`	Hungarian	`hu`
`Language.ID`	Indonesian	`id`
`Language.IT`	Italian	`it`
`Language.JA`	Japanese	`ja`
`Language.KO`	Korean	`ko`
`Language.MS`	Malay	`ms`
`Language.NL`	Dutch	`nl`
`Language.NO`	Norwegian	`no`
`Language.PL`	Polish	`pl`
`Language.PT`	Portuguese	`pt`
`Language.RO`	Romanian	`ro`
`Language.RU`	Russian	`ru`
`Language.SK`	Slovak	`sk`
`Language.SV`	Swedish	`sv`
`Language.TA`	Tamil	`ta`
`Language.TR`	Turkish	`tr`
`Language.UK`	Ukrainian	`uk`
`Language.VI`	Vietnamese	`vi`
`Language.ZH`	Chinese	`zh`

Common languages supported include:

Language.EN - English
Language.ES - Spanish
Language.FR - French
Language.DE - German
Language.IT - Italian
Language.JA - Japanese

Language support varies by model. Use multilingual models (eleven_flash_v2_5, eleven_turbo_v2_5) for language specification.

Supported Sample Rates

ElevenLabs supports specific sample rates with automatic format selection:

8000 Hz - pcm_8000
16000 Hz - pcm_16000
22050 Hz - pcm_22050
24000 Hz - pcm_24000 (default)
44100 Hz - pcm_44100

Model Selection

Choose the right model for your use case:

Model	Quality	Latency	Multilingual	Best For
`eleven_flash_v2_5`	High	Ultra-low	✅	Real-time conversations
`eleven_turbo_v2_5`	High	Ultra-low	✅	Real-time conversations
`eleven_multilingual_v2`	High	Medium	✅	Quality + languages
`eleven_flash_v2`	High	Low	❌	English-only apps

Usage Example

WebSocket Service (Recommended)

Initialize the ElevenLabsTTSService with your API key and use it in your pipeline:

from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
from pipecat.transcriptions.language import Language
import os

# Configure WebSocket service with voice customization
tts = ElevenLabsTTSService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
    model="eleven_flash_v2_5",
    params=ElevenLabsTTSService.InputParams(
        language=Language.EN,
        stability=0.7,
        similarity_boost=0.8,
        style=0.5,
        use_speaker_boost=True,
        speed=1.1
    )
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,  # Word timestamps enable precise context updates
    transport.output(),
    context_aggregator.assistant()
])

HTTP Service

Initialize the ElevenLabsHttpTTSService and use it in a pipeline:

from pipecat.services.elevenlabs.tts import ElevenLabsHttpTTSService
import aiohttp

# For simpler, non-persistent connections
async with aiohttp.ClientSession() as session:
    http_tts = ElevenLabsHttpTTSService(
        api_key=os.getenv("ELEVENLABS_API_KEY"),
        voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
        aiohttp_session=session,
        params=ElevenLabsHttpTTSService.InputParams(
            language=Language.EN,
            optimize_streaming_latency=3
        )
    )

Dynamic Configuration

Make settings updates by pushing an TTSUpdateSettingsFrame for either service:

from pipecat.frames.frames import TTSUpdateSettingsFrame

await task.queue_frame(TTSUpdateSettingsFrame(
    voice_id="new-voice-id",
  )
)

Voice Customization

ElevenLabs offers extensive voice control parameters:

# Advanced voice settings
params = ElevenLabsTTSService.InputParams(
    stability=0.7,              # Voice consistency (0.0-1.0)
    similarity_boost=0.8,       # Voice similarity (0.0-1.0)
    style=0.5,                  # Expression style (0.0-1.0, V2+ models)
    use_speaker_boost=True,     # Enhanced clarity (V2+ models)
    speed=1.2,                  # Speech rate (0.25-4.0)
    auto_mode=True,             # Latency optimization
    enable_ssml_parsing=True    # SSML support
)

tts = ElevenLabsTTSService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    voice_id="your-voice-id",
    model="eleven_flash_v2_5",
    params=params
)

Metrics

Both services provide comprehensive metrics:

Time to First Byte (TTFB) - Latency from text input to first audio
Processing Duration - Total synthesis time
Character Usage - Text processed for billing

Learn how to enable Metrics in your Pipeline.

Additional Notes

WebSocket Recommended: Use ElevenLabsTTSService for real-time applications with word timestamps and audio context management
Connection Management: WebSocket maintains persistent connection with automatic keepalive (10-second intervals)
Audio Context: WebSocket service manages multiple audio contexts for handling interruptions and overlapping requests
Voice Settings: Both stability and similarity_boost must be set together for voice customization
Language Specification: Only works with multilingual models (eleven_flash_v2_5, eleven_turbo_v2_5, eleven_multilingual_v2)
Sample Rate Constraints: Must use supported sample rates (8000, 16000, 22050, 24000, or 44100 Hz)
SSML Support: Enable SSML parsing for advanced speech markup control

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

API Reference

ElevenLabs TTS Docs

Example Code

Installation

Frames

Input

Output

Service Comparison

Language Support

Supported Sample Rates

Model Selection

Usage Example

WebSocket Service (Recommended)

HTTP Service

Dynamic Configuration

Voice Customization

Metrics

Additional Notes

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

API Reference

ElevenLabs TTS Docs

Example Code

​Installation

​Frames

​Input

​Output

​Service Comparison

​Language Support

​Supported Sample Rates

​Model Selection

​Usage Example

​WebSocket Service (Recommended)

​HTTP Service

​Dynamic Configuration

​Voice Customization

​Metrics

​Additional Notes

Overview

Installation

Frames

Input

Output

Service Comparison

Language Support

Supported Sample Rates

Model Selection

Usage Example

WebSocket Service (Recommended)

HTTP Service

Dynamic Configuration

Voice Customization

Metrics

Additional Notes