ElevenLabs
Text-to-speech service using ElevenLab’s streaming API with word-level timing
Overview
ElevenLabs TTS provides high-quality text-to-speech synthesis through two service implementations:
ElevenLabsTTSService
: WebSocket-based implementation with word-level timing and interruption supportElevenLabsHttpTTSService
: HTTP-based implementation for simpler use cases
Installation
To use ElevenLabsTTSService
, install the required dependencies:
You’ll also need to set up your ElevenLabs API key as an environment variable: ELEVENLABS_API_KEY
.
You can obtain a ElevenLabs API key by signing up at ElevenLabs.
ElevenLabsTTSService (WebSocket)
Configuration
ElevenLabs API key
Voice identifier
Model identifier
API endpoint URL
Output audio sample rate in Hz
Additional configuration parameters
Modifies text provided to the TTS. Learn more about the available filters.
InputParams
The language of the text to be synthesized
Optimization level for streaming latency
Defines the stability for voice settings
Defines the similarity boost for voice settings
Defines the style for voice settings. Available on V2+ models
Defines whether to use speaker boost for voice settings. Available on V2+ models
Speech rate multiplier. Higher values increase speech speed
This parameter focuses on reducing the latency by disabling the chunk schedule and buffers. Recommended when sending full sentences or phrases
ElevenLabsHttpTTSService (HTTP)
Configuration
ElevenLabs API key
Voice identifier
aiohttp ClientSession for HTTP requests
Model identifier
API base URL
Output audio sample rate in Hz
Additional configuration parameters (similar to WebSocket implementation)
Output Frames
TTSStartedFrame
Signals the start of audio generation.
TTSAudioRawFrame
Contains generated audio data:
Raw audio data chunk
Audio sample rate
Number of audio channels (1 for mono)
TTSStoppedFrame
Signals the completion of audio generation.
ErrorFrame (HTTP implementation)
Sent when an error occurs during HTTP TTS generation:
Error message describing what went wrong
Usage Examples
Basic Usage
With Voice Settings
Methods
See the TTS base class methods for additional functionality.
Language Support
ElevenLabs supports the following languages and their variants:
Language Code | Description | Service Code |
---|---|---|
Language.AR | Arabic | ar |
Language.BG | Bulgarian | bg |
Language.CS | Czech | cs |
Language.DA | Danish | da |
Language.DE | German | de |
Language.EL | Greek | el |
Language.EN | English | en |
Language.ES | Spanish | es |
Language.FI | Finnish | fi |
Language.FIL | Filipino | fil |
Language.FR | French | fr |
Language.HI | Hindi | hi |
Language.HR | Croatian | hr |
Language.HU | Hungarian | hu |
Language.ID | Indonesian | id |
Language.IT | Italian | it |
Language.JA | Japanese | ja |
Language.KO | Korean | ko |
Language.MS | Malay | ms |
Language.NL | Dutch | nl |
Language.NO | Norwegian | no |
Language.PL | Polish | pl |
Language.PT | Portuguese | pt |
Language.RO | Romanian | ro |
Language.RU | Russian | ru |
Language.SK | Slovak | sk |
Language.SV | Swedish | sv |
Language.TA | Tamil | ta |
Language.TR | Turkish | tr |
Language.UK | Ukrainian | uk |
Language.VI | Vietnamese | vi |
Language.ZH | Chinese | zh |
Note: Language support may vary based on the selected model. See the ElevenLabs docs for more details.
Usage Example
Frame Flow
Notes
- WebSocket implementation includes a 10-second keepalive mechanism
- Sample rate must be one of: 16000, 22050, 24000, or 44100 Hz
- Voice settings require both
stability
andsimilarity_boost
to be set - The
language
parameter only works with multilingual models - WebSocket implementation pauses frame processing during speech generation
- HTTP implementation requires an external aiohttp ClientSession