ElevenLabs
Text-to-speech service using ElevenLab’s streaming API with word-level timing
Overview
ElevenLabsTTSService
provides high-quality text-to-speech synthesis using ElevenLabs’ WebSocket API. It supports real-time streaming, word-level timing, and various voice customization options.
Installation
To use ElevenLabsTTSService
, install the required dependencies:
You’ll also need to set up your ElevenLabs API key as an environment variable: ELEVENLABS_API_KEY
.
You can obtain a ElevenLabs API key by signing up at ElevenLabs.
Configuration
Constructor Parameters
ElevenLabs API key
Voice identifier
Model identifier
API endpoint URL
Audio output format: - “pcm_16000” - “pcm_22050” - “pcm_24000” - “pcm_44100”
Modifies text provided to the TTS. Learn more about the available filters.
Input Parameters
The language of the text to be synthesized.
Optimization level for streaming latency.
Defines the stability for voice settings.
Defines the similarity boost for voice settings.
Defines the style for voice settings. Available on V2+ models.
Defines whether to use speaker boost for voice settings. Available on V2+ models.
This parameter focuses on reducing the latency by disabling the chunk schedule and all buffers. It is only recommended when sending full sentences or phrases, sending partial phrases will result in highly reduced quality. By default it’s set to false.
Output Frames
Control Frames
Signals start of synthesis
Signals completion of synthesis
Audio Frames
Contains generated audio data: - PCM encoded audio - Configured sample rate - Mono channel
Usage Examples
Basic Usage
With Voice Settings
Methods
See the TTS base class methods for additional functionality.
Language Support
ElevenLabs supports the following languages and their variants:
Language Code | Description | Service Code |
---|---|---|
Language.AR | Arabic | ar |
Language.BG | Bulgarian | bg |
Language.CS | Czech | cs |
Language.DA | Danish | da |
Language.DE | German | de |
Language.EL | Greek | el |
Language.EN | English | en |
Language.ES | Spanish | es |
Language.FI | Finnish | fi |
Language.FIL | Filipino | fil |
Language.FR | French | fr |
Language.HI | Hindi | hi |
Language.HR | Croatian | hr |
Language.HU | Hungarian | hu |
Language.ID | Indonesian | id |
Language.IT | Italian | it |
Language.JA | Japanese | ja |
Language.KO | Korean | ko |
Language.MS | Malay | ms |
Language.NL | Dutch | nl |
Language.NO | Norwegian | no |
Language.PL | Polish | pl |
Language.PT | Portuguese | pt |
Language.RO | Romanian | ro |
Language.RU | Russian | ru |
Language.SK | Slovak | sk |
Language.SV | Swedish | sv |
Language.TA | Tamil | ta |
Language.TR | Turkish | tr |
Language.UK | Ukrainian | uk |
Language.VI | Vietnamese | vi |
Language.ZH | Chinese | zh |
Note: Language support may vary based on the selected model. See the ElevenLabs docs for more details.
Usage Example
Frame Flow
Features
Sentence Aggregation
- Aggregates sentences for better audio quality
- Maintains natural speech flow
- Reduces artifacts
Word Timing
- Provides word-level timestamps
- Enables text-audio synchronization
- Supports interruption handling
Connection Management
- WebSocket-based streaming
- Automatic reconnection
- Keepalive handling
- Clean disconnection
Notes
- Supports real-time streaming
- Provides word-level timing
- Handles interruptions gracefully
- Maintains WebSocket connection
- Includes metrics collection
- Supports voice customization
- Thread-safe processing
- Automatic language mapping