Overview

ElevenLabsTTSService provides high-quality text-to-speech synthesis using ElevenLabs’ WebSocket API. It supports real-time streaming, word-level timing, and various voice customization options.

Installation

To use ElevenLabsTTSService, install the required dependencies:

pip install pipecat-ai[elevenlabs]

You’ll also need to set up your ElevenLabs API key as an environment variable: ELEVENLABS_API_KEY.

You can obtain a ElevenLabs API key by signing up at ElevenLabs.

Configuration

Constructor Parameters

api_key
str
required

ElevenLabs API key

voice_id
str
required

Voice identifier

model
str
default:
"eleven_flash_v2_5"

Model identifier

url
str
default:
"wss://api.elevenlabs.io"

API endpoint URL

output_format
ElevenLabsOutputFormat
default:
"pcm_24000"

Audio output format: - “pcm_16000” - “pcm_22050” - “pcm_24000” - “pcm_44100”

text_filter
BaseTextFilter
default:
"None"

Modifies text provided to the TTS. Learn more about the available filters.

Input Parameters

language
Language
default:
"Language.EN"

The language of the text to be synthesized.

optimize_streaming_latency
str

Optimization level for streaming latency.

stability
float

Defines the stability for voice settings.

similarity_boost
float

Defines the similarity boost for voice settings.

style
float

Defines the style for voice settings. Available on V2+ models.

use_speaker_boost
bool

Defines whether to use speaker boost for voice settings. Available on V2+ models.

auto_mode
bool
default:
"true"

This parameter focuses on reducing the latency by disabling the chunk schedule and all buffers. It is only recommended when sending full sentences or phrases, sending partial phrases will result in highly reduced quality. By default it’s set to false.

Output Frames

Control Frames

TTSStartedFrame
Frame

Signals start of synthesis

TTSStoppedFrame
Frame

Signals completion of synthesis

Audio Frames

TTSAudioRawFrame
Frame

Contains generated audio data: - PCM encoded audio - Configured sample rate - Mono channel

Usage Examples

Basic Usage

# Configure service
tts = ElevenLabsTTSService(
    api_key="your-api-key",
    voice_id="voice-id",
    output_format="pcm_24000",
    params=ElevenLabsTTSService.InputParams(
        language=Language.EN
    )
)

# Use in pipeline
pipeline = Pipeline([
    ...,
    llm,
    tts,
    transport.output()
])

With Voice Settings

# Configure with voice customization
tts = ElevenLabsTTSService(
    api_key="your-api-key",
    voice_id="voice-id",
    params=ElevenLabsTTSService.InputParams(
        stability=0.7,
        similarity_boost=0.8,
        style=0.5,
        use_speaker_boost=True
    )
)

Methods

See the TTS base class methods for additional functionality.

Language Support

ElevenLabs supports the following languages and their variants:

Language CodeDescriptionService Code
Language.ARArabicar
Language.BGBulgarianbg
Language.CSCzechcs
Language.DADanishda
Language.DEGermande
Language.ELGreekel
Language.ENEnglishen
Language.ESSpanishes
Language.FIFinnishfi
Language.FILFilipinofil
Language.FRFrenchfr
Language.HIHindihi
Language.HRCroatianhr
Language.HUHungarianhu
Language.IDIndonesianid
Language.ITItalianit
Language.JAJapaneseja
Language.KOKoreanko
Language.MSMalayms
Language.NLDutchnl
Language.NONorwegianno
Language.PLPolishpl
Language.PTPortuguesept
Language.RORomanianro
Language.RURussianru
Language.SKSlovaksk
Language.SVSwedishsv
Language.TATamilta
Language.TRTurkishtr
Language.UKUkrainianuk
Language.VIVietnamesevi
Language.ZHChinesezh

Note: Language support may vary based on the selected model. See the ElevenLabs docs for more details.

Usage Example

# Configure service with specific language
service = ElevenLabsTTSService(
    api_key="your-api-key",
    voice_id="voice-id",
    params=ElevenLabsTTSService.InputParams(
        language=Language.FR  # French
    )
)

Frame Flow

Features

Sentence Aggregation

  • Aggregates sentences for better audio quality
  • Maintains natural speech flow
  • Reduces artifacts

Word Timing

  • Provides word-level timestamps
  • Enables text-audio synchronization
  • Supports interruption handling

Connection Management

  • WebSocket-based streaming
  • Automatic reconnection
  • Keepalive handling
  • Clean disconnection

Notes

  • Supports real-time streaming
  • Provides word-level timing
  • Handles interruptions gracefully
  • Maintains WebSocket connection
  • Includes metrics collection
  • Supports voice customization
  • Thread-safe processing
  • Automatic language mapping