Overview
ElevenLabs provides high-quality text-to-speech synthesis with two service implementations:ElevenLabsTTSService(WebSocket) — Real-time streaming with word-level timestamps, audio context management, and interruption handling. Recommended for interactive applications.ElevenLabsHttpTTSService(HTTP) — Simpler batch-style synthesis. Suitable for non-interactive use cases or when WebSocket connections are not possible.
ElevenLabs TTS API Reference
Complete API reference for all parameters and methods
Example Implementation
Complete example with WebSocket streaming
ElevenLabs Documentation
Official ElevenLabs TTS API documentation
Voice Library
Browse and clone voices from the community
Installation
Prerequisites
- ElevenLabs Account: Sign up at ElevenLabs
- API Key: Generate an API key from your account dashboard
- Voice Selection: Choose voice IDs from the voice library
Configuration
ElevenLabsTTSService
ElevenLabs API key.
Voice ID from the voice library.
Deprecated in v0.0.105. Use
settings=ElevenLabsTTSService.Settings(voice=...) instead.ElevenLabs model ID. Use a
multilingual model variant (e.g.
eleven_multilingual_v2) if you need non-English language support.
Deprecated in v0.0.105. Use
settings=ElevenLabsTTSService.Settings(model=...) instead.WebSocket endpoint URL. Override for custom or proxied deployments.
Output audio sample rate in Hz. When
None, uses the pipeline’s configured
sample rate.Controls how incoming text is aggregated before synthesis.
SENTENCE
(default) buffers text until sentence boundaries, producing more natural
speech. TOKEN streams tokens directly for lower latency. Import from
pipecat.services.tts_service.Deprecated in v0.0.104. Use
text_aggregation_mode instead.Deprecated in v0.0.105. Use
settings=ElevenLabsTTSService.Settings(...)
instead.Runtime-configurable settings. See Settings below.
ElevenLabsHttpTTSService
The HTTP service accepts the same parameters as the WebSocket service, with these differences:An aiohttp session for HTTP requests. You must create and manage this
yourself.
HTTP API base URL (instead of
url for WebSocket).ElevenLabsHttpTTSSettings which also includes:
Latency optimization level (0–4). Higher values reduce latency at the cost of
quality.
Settings
Runtime-configurable settings passed via thesettings constructor argument using ElevenLabsTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | None | ElevenLabs model identifier. (Inherited from base settings.) |
voice | str | None | Voice identifier. (Inherited from base settings.) |
language | Language | str | None | Language code. Only effective with multilingual models. (Inherited from base settings.) |
stability | float | NOT_GIVEN | Voice consistency (0.0–1.0). Lower values are more expressive, higher values are more consistent. |
similarity_boost | float | NOT_GIVEN | Voice clarity and similarity to the original (0.0–1.0). |
style | float | NOT_GIVEN | Style exaggeration (0.0–1.0). Higher values amplify the voice’s style. |
use_speaker_boost | bool | NOT_GIVEN | Enhance clarity and target speaker similarity. |
speed | float | NOT_GIVEN | Speech rate. WebSocket: 0.7–1.2. HTTP: 0.25–4.0. |
apply_text_normalization | Literal | NOT_GIVEN | Text normalization: "auto", "on", or "off". |
NOT_GIVEN values use the ElevenLabs API defaults. See ElevenLabs voice
settings
for details on how these parameters interact.Usage
Basic Setup
With Voice Customization
Updating Settings at Runtime
Voice settings can be changed mid-conversation usingTTSUpdateSettingsFrame:
HTTP Service
Notes
- Multilingual models required for
language: Settinglanguagewith a non-multilingual model (e.g.eleven_turbo_v2_5) has no effect. Useeleven_multilingual_v2or similar. - WebSocket vs HTTP: The WebSocket service supports word-level timestamps and interruption handling, making it significantly better for interactive conversations. The HTTP service is simpler but lacks these features.
- Text aggregation: Sentence aggregation is enabled by default (
text_aggregation_mode=TextAggregationMode.SENTENCE). Buffering until sentence boundaries produces more natural speech. Settext_aggregation_mode=TextAggregationMode.TOKENto stream tokens directly for lower latency, but you must also setauto_mode=Falseinsettingswhen using TOKEN mode.
Event Handlers
ElevenLabs TTS supports the standard service connection events:| Event | Description |
|---|---|
on_connected | Connected to ElevenLabs WebSocket |
on_disconnected | Disconnected from ElevenLabs WebSocket |
on_connection_error | WebSocket connection error occurred |