Overview
ElevenLabs provides high-quality text-to-speech synthesis with two service implementations:ElevenLabsTTSService(WebSocket) — Real-time streaming with word-level timestamps, audio context management, and interruption handling. Recommended for interactive applications.ElevenLabsHttpTTSService(HTTP) — Simpler batch-style synthesis. Suitable for non-interactive use cases or when WebSocket connections are not possible.
ElevenLabs TTS API Reference
Complete API reference for all parameters and methods
Example Implementation
Complete example with WebSocket streaming
ElevenLabs Documentation
Official ElevenLabs TTS API documentation
Voice Library
Browse and clone voices from the community
Installation
Prerequisites
- ElevenLabs Account: Sign up at ElevenLabs
- API Key: Generate an API key from your account dashboard
- Voice Selection: Choose voice IDs from the voice library
Configuration
ElevenLabsTTSService
ElevenLabs API key.
Voice ID from the voice library.
ElevenLabs model ID. Use a
multilingual model variant (e.g. eleven_multilingual_v2) if you need non-English language support.WebSocket endpoint URL. Override for custom or proxied deployments.
Output audio sample rate in Hz. When
None, uses the pipeline’s configured sample rate.Buffer text until sentence boundaries before sending to ElevenLabs. Produces more natural-sounding speech at the cost of a small latency increase (~15ms) for the first word of each sentence.
Runtime-configurable voice and generation settings. See InputParams below.
ElevenLabsHttpTTSService
The HTTP service accepts the same parameters as the WebSocket service, with these differences:An aiohttp session for HTTP requests. You must create and manage this yourself.
HTTP API base URL (instead of
url for WebSocket).Latency optimization level (0–4). Higher values reduce latency at the cost of quality.
InputParams
Voice and generation settings that can be set at initialization via theparams constructor argument, or changed at runtime via UpdateSettingsFrame.
| Parameter | Type | Default | Description |
|---|---|---|---|
language | Language | None | Language code. Only effective with multilingual models. |
stability | float | None | Voice consistency (0.0–1.0). Lower values are more expressive, higher values are more consistent. |
similarity_boost | float | None | Voice clarity and similarity to the original (0.0–1.0). |
style | float | None | Style exaggeration (0.0–1.0). Higher values amplify the voice’s style. |
use_speaker_boost | bool | None | Enhance clarity and target speaker similarity. |
speed | float | None | Speech rate. WebSocket: 0.7–1.2. HTTP: 0.25–4.0. |
auto_mode | bool | True | Automatic optimization mode. WebSocket only. |
enable_ssml_parsing | bool | None | Parse SSML tags in input text. WebSocket only. |
apply_text_normalization | Literal | None | Text normalization: "auto", "on", or "off". |
None values use the ElevenLabs API defaults. See ElevenLabs voice settings for details on how these parameters interact.Usage
Basic Setup
With Voice Customization
Updating Settings at Runtime
Voice settings can be changed mid-conversation usingUpdateSettingsFrame:
HTTP Service
Notes
- Multilingual models required for
language: Settinglanguagewith a non-multilingual model (e.g.eleven_turbo_v2_5) has no effect. Useeleven_multilingual_v2or similar. - WebSocket vs HTTP: The WebSocket service supports word-level timestamps and interruption handling, making it significantly better for interactive conversations. The HTTP service is simpler but lacks these features.
- Sentence aggregation: Enabled by default. Buffering until sentence boundaries produces more natural speech with minimal latency impact. Disable with
aggregate_sentences=Falseif you need word-by-word streaming.
Event Handlers
ElevenLabs TTS supports the standard service connection events:| Event | Description |
|---|---|
on_connected | Connected to ElevenLabs WebSocket |
on_disconnected | Disconnected from ElevenLabs WebSocket |
on_connection_error | WebSocket connection error occurred |