ElevenLabs
Text-to-speech service using ElevenLab’s streaming API with word-level timing
Overview
ElevenLabsTTSService
provides high-quality text-to-speech synthesis using ElevenLabs’ WebSocket API. It supports real-time streaming, word-level timing, and various voice customization options.
Installation
To use ElevenLabsTTSService
, install the required dependencies:
You’ll also need to set up your ElevenLabs API key as an environment variable: ELEVENLABS_API_KEY
.
You can obtain a ElevenLabs API key by signing up at ElevenLabs.
Configuration
Constructor Parameters
ElevenLabs API key
Voice identifier
Model identifier
API endpoint URL
Audio output format: - “pcm_16000” - “pcm_22050” - “pcm_24000” - “pcm_44100”
Modifies text provided to the TTS. Learn more about the available filters.
Input Parameters
Voice Settings
Voice characteristics can be configured using:
Voice stability (requires similarity_boost)
Voice similarity boost (requires stability)
Style intensity (requires stability and similarity_boost)
Enable speaker boost (requires stability and similarity_boost)
Output Frames
Control Frames
Signals start of synthesis
Signals completion of synthesis
Audio Frames
Contains generated audio data: - PCM encoded audio - Configured sample rate - Mono channel
Usage Examples
Basic Usage
With Voice Settings
Methods
See the TTS base class methods for additional functionality.
Language Support
ElevenLabs supports the following languages and their variants:
Language Code | Description | Service Code |
---|---|---|
Language.BG | Bulgarian | bg |
Language.ZH | Chinese | zh |
Language.CS | Czech | cs |
Language.DA | Danish | da |
Language.NL | Dutch | nl |
Language.EN | English | en |
Language.EN_US | English (US) | en |
Language.EN_AU | English (Australia) | en |
Language.EN_GB | English (UK) | en |
Language.EN_NZ | English (New Zealand) | en |
Language.EN_IN | English (India) | en |
Language.FI | Finnish | fi |
Language.FR | French | fr |
Language.FR_CA | French (Canada) | fr |
Language.DE | German | de |
Language.DE_CH | German (Swiss) | de |
Language.EL | Greek | el |
Language.HI | Hindi | hi |
Language.HU | Hungarian | hu |
Language.ID | Indonesian | id |
Language.IT | Italian | it |
Language.JA | Japanese | ja |
Language.KO | Korean | ko |
Language.MS | Malay | ms |
Language.NO | Norwegian | no |
Language.PL | Polish | pl |
Language.PT | Portuguese | pt-PT |
Language.PT_BR | Portuguese (Brazil) | pt-BR |
Language.RO | Romanian | ro |
Language.RU | Russian | ru |
Language.SK | Slovak | sk |
Language.ES | Spanish | es |
Language.SV | Swedish | sv |
Language.TR | Turkish | tr |
Language.UK | Ukrainian | uk |
Language.VI | Vietnamese | vi |
Note: Language support may vary based on the selected model.
Usage Example
Word Timing
The service provides word-level timing information:
Frame Flow
Features
Sentence Aggregation
- Aggregates sentences for better audio quality
- Maintains natural speech flow
- Reduces artifacts
Word Timing
- Provides word-level timestamps
- Enables text-audio synchronization
- Supports interruption handling
Connection Management
- WebSocket-based streaming
- Automatic reconnection
- Keepalive handling
- Clean disconnection
Notes
- Supports real-time streaming
- Provides word-level timing
- Handles interruptions gracefully
- Maintains WebSocket connection
- Includes metrics collection
- Supports voice customization
- Thread-safe processing
- Automatic language mapping