Overview

ElevenLabs TTS provides high-quality text-to-speech synthesis through two service implementations:

  • ElevenLabsTTSService: WebSocket-based implementation with word-level timing and interruption support
  • ElevenLabsHttpTTSService: HTTP-based implementation for simpler use cases

Installation

To use ElevenLabsTTSService, install the required dependencies:

pip install "pipecat-ai[elevenlabs]"

You’ll also need to set up your ElevenLabs API key as an environment variable: ELEVENLABS_API_KEY.

You can obtain a ElevenLabs API key by signing up at ElevenLabs.

ElevenLabsTTSService (WebSocket)

Configuration

api_key
str
required

ElevenLabs API key

voice_id
str
required

Voice identifier

model
str
default:"eleven_flash_v2_5"

Model identifier

url
str
default:"wss://api.elevenlabs.io"

API endpoint URL

sample_rate
int
default:"None"

Output audio sample rate in Hz

params
InputParams
default:"InputParams()"

Additional configuration parameters

text_filter
BaseTextFilter
default:"None"

Modifies text provided to the TTS. Learn more about the available filters.

InputParams

language
Language
default:"None"

The language of the text to be synthesized

optimize_streaming_latency
str
default:"None"

Optimization level for streaming latency

stability
float
default:"None"

Defines the stability for voice settings

similarity_boost
float
default:"None"

Defines the similarity boost for voice settings

style
float
default:"None"

Defines the style for voice settings. Available on V2+ models

use_speaker_boost
bool
default:"None"

Defines whether to use speaker boost for voice settings. Available on V2+ models

speed
float
default:"None"

Speech rate multiplier. Higher values increase speech speed

auto_mode
bool
default:"True"

This parameter focuses on reducing the latency by disabling the chunk schedule and buffers. Recommended when sending full sentences or phrases

ElevenLabsHttpTTSService (HTTP)

Configuration

api_key
str
required

ElevenLabs API key

voice_id
str
required

Voice identifier

aiohttp_session
aiohttp.ClientSession
required

aiohttp ClientSession for HTTP requests

model
str
default:"eleven_flash_v2_5"

Model identifier

base_url
str
default:"https://api.elevenlabs.io"

API base URL

sample_rate
int
default:"None"

Output audio sample rate in Hz

params
InputParams
default:"InputParams()"

Additional configuration parameters (similar to WebSocket implementation)

Output Frames

TTSStartedFrame

Signals the start of audio generation.

TTSAudioRawFrame

Contains generated audio data:

audio
bytes

Raw audio data chunk

sample_rate
int

Audio sample rate

num_channels
int

Number of audio channels (1 for mono)

TTSStoppedFrame

Signals the completion of audio generation.

ErrorFrame (HTTP implementation)

Sent when an error occurs during HTTP TTS generation:

error
str

Error message describing what went wrong

Usage Examples

Basic Usage

# Configure service
tts = ElevenLabsTTSService(
    api_key="your-api-key",
    voice_id="voice-id",
    sample_rate=24000,
    params=ElevenLabsTTSService.InputParams(
        language=Language.EN
    )
)

# Use in pipeline
pipeline = Pipeline([
    ...,
    llm,
    tts,
    transport.output()
])

With Voice Settings

# Configure with voice customization
tts = ElevenLabsTTSService(
    api_key="your-api-key",
    voice_id="voice-id",
    params=ElevenLabsTTSService.InputParams(
        stability=0.7,
        similarity_boost=0.8,
        style=0.5,
        use_speaker_boost=True
    )
)

Methods

See the TTS base class methods for additional functionality.

Language Support

ElevenLabs supports the following languages and their variants:

Language CodeDescriptionService Code
Language.ARArabicar
Language.BGBulgarianbg
Language.CSCzechcs
Language.DADanishda
Language.DEGermande
Language.ELGreekel
Language.ENEnglishen
Language.ESSpanishes
Language.FIFinnishfi
Language.FILFilipinofil
Language.FRFrenchfr
Language.HIHindihi
Language.HRCroatianhr
Language.HUHungarianhu
Language.IDIndonesianid
Language.ITItalianit
Language.JAJapaneseja
Language.KOKoreanko
Language.MSMalayms
Language.NLDutchnl
Language.NONorwegianno
Language.PLPolishpl
Language.PTPortuguesept
Language.RORomanianro
Language.RURussianru
Language.SKSlovaksk
Language.SVSwedishsv
Language.TATamilta
Language.TRTurkishtr
Language.UKUkrainianuk
Language.VIVietnamesevi
Language.ZHChinesezh

Note: Language support may vary based on the selected model. See the ElevenLabs docs for more details.

Usage Example

# Configure service with specific language
service = ElevenLabsTTSService(
    api_key="your-api-key",
    voice_id="voice-id",
    params=ElevenLabsTTSService.InputParams(
        language=Language.FR  # French
    )
)

Frame Flow

Notes

  • WebSocket implementation includes a 10-second keepalive mechanism
  • Sample rate must be one of: 16000, 22050, 24000, or 44100 Hz
  • Voice settings require both stability and similarity_boost to be set
  • The language parameter only works with multilingual models
  • WebSocket implementation pauses frame processing during speech generation
  • HTTP implementation requires an external aiohttp ClientSession