Overview

ElevenLabs provides high-quality text-to-speech synthesis with two implementations:
  • ElevenLabsTTSService: WebSocket-based with word timestamps and audio context management
  • ElevenLabsHttpTTSService: HTTP-based for simpler integration
ElevenLabsTTSService is recommended for real-time applications requiring precise timing.

Installation

To use ElevenLabs services, install the required dependencies:
pip install "pipecat-ai[elevenlabs]"
You’ll also need to set up your ElevenLabs API key as an environment variable: ELEVENLABS_API_KEY.
Get your API key by signing up at ElevenLabs.

Frames

Input

  • TextFrame - Text content to synthesize into speech
  • TTSSpeakFrame - Text that should be spoken immediately
  • TTSUpdateSettingsFrame - Runtime configuration updates
  • LLMFullResponseStartFrame / LLMFullResponseEndFrame - LLM response boundaries

Output

  • TTSStartedFrame - Signals start of synthesis
  • TTSAudioRawFrame - Generated audio data chunks with word timing
  • TTSStoppedFrame - Signals completion of synthesis
  • ErrorFrame - API or processing errors

Service Comparison

FeatureElevenLabsTTSService (WebSocket)ElevenLabsHttpTTSService (HTTP)
Word Timestamps✅ Real-time precision✅ Batch processing
Streaming✅ Low-latency chunks✅ Response streaming
Audio Context✅ Advanced management❌ Basic
Interruption✅ Context-aware⚠️ Limited
ConnectionWebSocket persistentHTTP per-request

Language Support

Common languages supported include:
  • Language.EN - English
  • Language.ES - Spanish
  • Language.FR - French
  • Language.DE - German
  • Language.IT - Italian
  • Language.JA - Japanese
Language support varies by model. Use multilingual models (eleven_flash_v2_5, eleven_turbo_v2_5) for language specification.

Supported Sample Rates

ElevenLabs supports specific sample rates with automatic format selection:
  • 8000 Hz - pcm_8000
  • 16000 Hz - pcm_16000
  • 22050 Hz - pcm_22050
  • 24000 Hz - pcm_24000 (default)
  • 44100 Hz - pcm_44100

Model Selection

Choose the right model for your use case:
ModelQualityLatencyMultilingualBest For
eleven_flash_v2_5HighUltra-lowReal-time conversations
eleven_turbo_v2_5HighUltra-lowReal-time conversations
eleven_multilingual_v2HighMediumQuality + languages
eleven_flash_v2HighLowEnglish-only apps

Usage Example

Initialize the ElevenLabsTTSService with your API key and use it in your pipeline:
from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
from pipecat.transcriptions.language import Language
import os

# Configure WebSocket service with voice customization
tts = ElevenLabsTTSService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
    model="eleven_flash_v2_5",
    params=ElevenLabsTTSService.InputParams(
        language=Language.EN,
        stability=0.7,
        similarity_boost=0.8,
        style=0.5,
        use_speaker_boost=True,
        speed=1.1
    )
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,  # Word timestamps enable precise context updates
    transport.output(),
    context_aggregator.assistant()
])

HTTP Service

Initialize the ElevenLabsHttpTTSService and use it in a pipeline:
from pipecat.services.elevenlabs.tts import ElevenLabsHttpTTSService
import aiohttp

# For simpler, non-persistent connections
async with aiohttp.ClientSession() as session:
    http_tts = ElevenLabsHttpTTSService(
        api_key=os.getenv("ELEVENLABS_API_KEY"),
        voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
        aiohttp_session=session,
        params=ElevenLabsHttpTTSService.InputParams(
            language=Language.EN,
            optimize_streaming_latency=3
        )
    )

Dynamic Configuration

Make settings updates by pushing an TTSUpdateSettingsFrame for either service:
from pipecat.frames.frames import TTSUpdateSettingsFrame

await task.queue_frame(TTSUpdateSettingsFrame(
    voice_id="new-voice-id",
  )
)

Voice Customization

ElevenLabs offers extensive voice control parameters:
# Advanced voice settings
params = ElevenLabsTTSService.InputParams(
    stability=0.7,              # Voice consistency (0.0-1.0)
    similarity_boost=0.8,       # Voice similarity (0.0-1.0)
    style=0.5,                  # Expression style (0.0-1.0, V2+ models)
    use_speaker_boost=True,     # Enhanced clarity (V2+ models)
    speed=1.2,                  # Speech rate (0.25-4.0)
    auto_mode=True,             # Latency optimization
    enable_ssml_parsing=True    # SSML support
)

tts = ElevenLabsTTSService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    voice_id="your-voice-id",
    model="eleven_flash_v2_5",
    params=params
)

Metrics

Both services provide comprehensive metrics:
  • Time to First Byte (TTFB) - Latency from text input to first audio
  • Processing Duration - Total synthesis time
  • Character Usage - Text processed for billing
Learn how to enable Metrics in your Pipeline.

Additional Notes

  • WebSocket Recommended: Use ElevenLabsTTSService for real-time applications with word timestamps and audio context management
  • Connection Management: WebSocket maintains persistent connection with automatic keepalive (10-second intervals)
  • Audio Context: WebSocket service manages multiple audio contexts for handling interruptions and overlapping requests
  • Voice Settings: Both stability and similarity_boost must be set together for voice customization
  • Language Specification: Only works with multilingual models (eleven_flash_v2_5, eleven_turbo_v2_5, eleven_multilingual_v2)
  • Sample Rate Constraints: Must use supported sample rates (8000, 16000, 22050, 24000, or 44100 Hz)
  • SSML Support: Enable SSML parsing for advanced speech markup control