> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# ElevenLabs

> Text-to-speech service using ElevenLabs' streaming API with word-level timing

## Overview

ElevenLabs provides high-quality text-to-speech synthesis with two service implementations:

* **`ElevenLabsTTSService`** (WebSocket) — Real-time streaming with word-level timestamps, audio context management, and interruption handling. Recommended for interactive applications.
* **`ElevenLabsHttpTTSService`** (HTTP) — Simpler batch-style synthesis. Suitable for non-interactive use cases or when WebSocket connections are not possible.

<CardGroup cols={2}>
  <Card title="ElevenLabs TTS API Reference" icon="code" href="https://reference-server.pipecat.ai/en/latest/api/pipecat.services.elevenlabs.tts.html">
    Complete API reference for all parameters and methods
  </Card>

  <Card title="Example Implementation" icon="play" href="https://github.com/pipecat-ai/pipecat/blob/main/examples/voice/voice-elevenlabs.py">
    Complete example with WebSocket streaming
  </Card>

  <Card title="ElevenLabs Documentation" icon="book" href="https://elevenlabs.io/docs/api-reference/text-to-speech/v-1-text-to-speech-voice-id-multi-stream-input">
    Official ElevenLabs TTS API documentation
  </Card>

  <Card title="Voice Library" icon="microphone" href="https://elevenlabs.io/voice-library">
    Browse and clone voices from the community
  </Card>
</CardGroup>

## Installation

```bash theme={null}
uv add "pipecat-ai[elevenlabs]"
```

## Prerequisites

1. **ElevenLabs Account**: Sign up at [ElevenLabs](https://elevenlabs.io/app/sign-up)
2. **API Key**: Generate an API key from your account dashboard
3. **Voice Selection**: Choose voice IDs from the [voice library](https://elevenlabs.io/voice-library)

Set the following environment variable:

```bash theme={null}
export ELEVENLABS_API_KEY=your_api_key
```

## Configuration

### ElevenLabsTTSService

<ParamField path="api_key" type="str" required>
  ElevenLabs API key.
</ParamField>

<ParamField path="voice_id" type="str" required deprecated>
  Voice ID from the [voice library](https://elevenlabs.io/voice-library).
  *Deprecated in v0.0.105. Use
  `settings=ElevenLabsTTSService.Settings(voice=...)` instead.*
</ParamField>

<ParamField path="model" type="str" default="eleven_turbo_v2_5" deprecated>
  ElevenLabs model ID. Use a `multilingual` model variant (e.g.
  `eleven_multilingual_v2`) if you need non-English language support.
  *Deprecated in v0.0.105. Use
  `settings=ElevenLabsTTSService.Settings(model=...)` instead.*
</ParamField>

<ParamField path="url" type="str" default="wss://api.elevenlabs.io">
  WebSocket endpoint URL. Override for custom or proxied deployments.
</ParamField>

<ParamField path="sample_rate" type="int" default="None">
  Output audio sample rate in Hz. When `None`, uses the pipeline's configured
  sample rate.
</ParamField>

<ParamField path="auto_mode" type="bool" default="None">
  Whether to enable ElevenLabs' auto mode, which reduces latency by disabling
  server-side chunk scheduling and buffering. Recommended when sending complete
  sentences or phrases. When `None` (default), auto mode is automatically
  enabled for `SENTENCE` aggregation and disabled for `TOKEN` aggregation —
  because token streaming relies on the server-side chunk scheduler to
  accumulate enough text for natural-sounding synthesis.
</ParamField>

<ParamField path="text_aggregation_mode" type="TextAggregationMode" default="TextAggregationMode.SENTENCE">
  Controls how incoming text is aggregated before synthesis. `SENTENCE`
  (default) buffers text until sentence boundaries, producing more natural
  speech. `TOKEN` streams tokens directly for lower latency. Import from
  `pipecat.services.tts_service`.
</ParamField>

<ParamField path="aggregate_sentences" type="bool" default="None" deprecated>
  *Deprecated in v0.0.104.* Use `text_aggregation_mode` instead.
</ParamField>

<ParamField path="params" type="InputParams" default="None" deprecated>
  *Deprecated in v0.0.105. Use `settings=ElevenLabsTTSService.Settings(...)`
  instead.*
</ParamField>

<ParamField path="settings" type="ElevenLabsTTSService.Settings" default="None">
  Runtime-configurable settings. See [Settings](#settings) below.
</ParamField>

### ElevenLabsHttpTTSService

The HTTP service accepts the same parameters as the WebSocket service, with these differences:

<ParamField path="aiohttp_session" type="aiohttp.ClientSession" required>
  An aiohttp session for HTTP requests. You must create and manage this
  yourself.
</ParamField>

<ParamField path="base_url" type="str" default="https://api.elevenlabs.io">
  HTTP API base URL (instead of `url` for WebSocket).
</ParamField>

<ParamField path="enable_logging" type="bool" default="None">
  Whether to enable ElevenLabs server-side logging. Set to `False` for zero
  retention mode (enterprise only).
</ParamField>

The HTTP service uses `ElevenLabsHttpTTSSettings` which also includes:

<ParamField path="optimize_streaming_latency" type="int" default="None">
  Latency optimization level (0–4). Higher values reduce latency at the cost of
  quality.
</ParamField>

### Settings

Runtime-configurable settings passed via the `settings` constructor argument using `ElevenLabsTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.

| Parameter                  | Type              | Default     | Description                                                                                       |
| -------------------------- | ----------------- | ----------- | ------------------------------------------------------------------------------------------------- |
| `model`                    | `str`             | `None`      | ElevenLabs model identifier. *(Inherited from base settings.)*                                    |
| `voice`                    | `str`             | `None`      | Voice identifier. *(Inherited from base settings.)*                                               |
| `language`                 | `Language \| str` | `None`      | Language code. Only effective with multilingual models. *(Inherited from base settings.)*         |
| `stability`                | `float`           | `NOT_GIVEN` | Voice consistency (0.0–1.0). Lower values are more expressive, higher values are more consistent. |
| `similarity_boost`         | `float`           | `NOT_GIVEN` | Voice clarity and similarity to the original (0.0–1.0).                                           |
| `style`                    | `float`           | `NOT_GIVEN` | Style exaggeration (0.0–1.0). Higher values amplify the voice's style.                            |
| `use_speaker_boost`        | `bool`            | `NOT_GIVEN` | Enhance clarity and target speaker similarity.                                                    |
| `speed`                    | `float`           | `NOT_GIVEN` | Speech rate. WebSocket: 0.7–1.2. HTTP: 0.25–4.0.                                                  |
| `apply_text_normalization` | `Literal`         | `NOT_GIVEN` | Text normalization: `"auto"`, `"on"`, or `"off"`.                                                 |

<Note>
  `NOT_GIVEN` values use the ElevenLabs API defaults. See [ElevenLabs voice
  settings](https://elevenlabs.io/docs/api-reference/text-to-speech/v-1-text-to-speech-voice-id-multi-stream-input)
  for details on how these parameters interact.
</Note>

## Usage

### Basic Setup

```python theme={null}
from pipecat.services.elevenlabs import ElevenLabsTTSService

tts = ElevenLabsTTSService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    settings=ElevenLabsTTSService.Settings(
        voice="21m00Tcm4TlvDq8ikWAM",  # Rachel
    ),
)
```

### With Voice Customization

```python theme={null}
tts = ElevenLabsTTSService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    settings=ElevenLabsTTSService.Settings(
        voice="21m00Tcm4TlvDq8ikWAM",
        model="eleven_multilingual_v2",
        language=Language.ES,
        stability=0.7,
        similarity_boost=0.8,
        speed=1.1,
    ),
)
```

### Updating Settings at Runtime

Voice settings can be changed mid-conversation using `TTSUpdateSettingsFrame`:

```python theme={null}
from pipecat.frames.frames import TTSUpdateSettingsFrame
from pipecat.services.elevenlabs.tts import ElevenLabsTTSSettings

await task.queue_frame(
    TTSUpdateSettingsFrame(
        delta=ElevenLabsTTSSettings(
            stability=0.3,
            speed=1.1,
        )
    )
)
```

### HTTP Service

```python theme={null}
import aiohttp
from pipecat.services.elevenlabs import ElevenLabsHttpTTSService

async with aiohttp.ClientSession() as session:
    tts = ElevenLabsHttpTTSService(
        api_key=os.getenv("ELEVENLABS_API_KEY"),
        settings=ElevenLabsHttpTTSService.Settings(
            voice="21m00Tcm4TlvDq8ikWAM",
        ),
        aiohttp_session=session,
    )
```

<Tip>
  The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
  `Settings` / `settings=` instead. See the [Service Settings
  guide](/pipecat/fundamentals/service-settings) for migration details.
</Tip>

## Notes

* **Multilingual models required for `language`**: Setting `language` with a non-multilingual model (e.g. `eleven_turbo_v2_5`) has no effect. Use `eleven_multilingual_v2` or similar.
* **WebSocket vs HTTP**: The WebSocket service supports word-level timestamps and interruption handling, making it significantly better for interactive conversations. The HTTP service is simpler but lacks these features.
* **Text aggregation**: Sentence aggregation is enabled by default (`text_aggregation_mode=TextAggregationMode.SENTENCE`). Buffering until sentence boundaries produces more natural speech. Set `text_aggregation_mode=TextAggregationMode.TOKEN` to stream tokens directly for lower latency. The `auto_mode` parameter is automatically configured based on the aggregation mode for optimal quality.
* **Word timestamp accuracy**: Word timestamps accurately reflect the spoken audio, not just the input text. When using pronunciation dictionaries or text normalization (`apply_text_normalization`), the service consumes ElevenLabs' normalized alignment data to ensure downstream consumers (captions, transcripts, context aggregation) match what the listener actually hears.

## Event Handlers

ElevenLabs TTS supports the standard [service connection events](/api-reference/server/events/service-events):

| Event                 | Description                            |
| --------------------- | -------------------------------------- |
| `on_connected`        | Connected to ElevenLabs WebSocket      |
| `on_disconnected`     | Disconnected from ElevenLabs WebSocket |
| `on_connection_error` | WebSocket connection error occurred    |

```python theme={null}
@tts.event_handler("on_connected")
async def on_connected(service):
    print("Connected to ElevenLabs")
```
