> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# XTTS

> Text-to-speech service implementation using Coqui's XTTS streaming server

<Warning>
  Coqui, the XTTS maintainer, has shut down. XTTS may not receive future updates
  or support.
</Warning>

## Overview

`XTTSTTSService` provides multilingual voice synthesis with voice cloning capabilities through a locally hosted streaming server. The service supports real-time streaming and custom voice training using Coqui's XTTS-v2 model for cross-lingual text-to-speech.

<CardGroup cols={2}>
  <Card title="XTTS API Reference" icon="code" href="https://reference-server.pipecat.ai/en/latest/api/pipecat.services.xtts.tts.html">
    Pipecat's API methods for XTTS integration
  </Card>

  <Card title="Example Implementation" icon="play" href="https://github.com/pipecat-ai/pipecat/blob/main/examples/voice/voice-xtts.py">
    Complete example with voice cloning
  </Card>

  <Card title="XTTS Repository" icon="book" href="https://github.com/coqui-ai/xtts-streaming-server">
    Official XTTS streaming server repository
  </Card>

  <Card title="Voice Cloning" icon="microphone" href="https://github.com/coqui-ai/xtts-streaming-server#voice-cloning">
    Learn about custom voice training
  </Card>
</CardGroup>

## Installation

XTTS requires a running streaming server. Start the server using Docker:

```bash theme={null}
docker run --gpus=all -e COQUI_TOS_AGREED=1 --rm -p 8000:80 \
  ghcr.io/coqui-ai/xtts-streaming-server:latest-cuda121
```

## Prerequisites

### XTTS Server Setup

Before using XTTSTTSService, you need:

1. **Docker Environment**: Set up Docker with GPU support for optimal performance
2. **XTTS Server**: Run the XTTS streaming server container
3. **Voice Models**: Configure voice models and cloning samples as needed

### Required Configuration

* **Server URL**: Configure the XTTS server endpoint (default: `http://localhost:8000`)
* **Voice Selection**: Set up voice models or voice cloning samples

<Note>
  GPU acceleration is recommended for optimal performance. The server requires
  CUDA support for best results.
</Note>

## Configuration

### XTTSService

<ParamField path="voice_id" type="str" required deprecated>
  ID of the studio speaker to use for synthesis. *Deprecated in v0.0.105. Use
  `settings=XTTSService.Settings(voice=...)` instead.*
</ParamField>

<ParamField path="base_url" type="str" required>
  Base URL of the XTTS streaming server (e.g. `http://localhost:8000`).
</ParamField>

<ParamField path="aiohttp_session" type="aiohttp.ClientSession" required>
  An aiohttp session for HTTP requests to the XTTS server.
</ParamField>

<ParamField path="language" type="Language" default="Language.EN" deprecated>
  Language for synthesis. Supports Czech, German, English, Spanish, French,
  Hindi, Hungarian, Italian, Japanese, Korean, Dutch, Polish, Portuguese,
  Russian, Turkish, and Chinese. *Deprecated in v0.0.106. Use
  `settings=XTTSService.Settings(language=...)` instead.*
</ParamField>

<ParamField path="settings" type="XTTSService.Settings" default="None">
  Runtime-configurable settings. See [Settings](#settings) below.
</ParamField>

<ParamField path="sample_rate" type="int" default="None">
  Output audio sample rate in Hz. When `None`, uses the pipeline's configured
  sample rate. Audio is automatically resampled from XTTS's native 24kHz output.
</ParamField>

### Settings

Runtime-configurable settings passed via the `settings` constructor argument using `XTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.

| Parameter  | Type              | Default | Description                            |
| ---------- | ----------------- | ------- | -------------------------------------- |
| `model`    | `str`             | `None`  | Model identifier. *(Inherited.)*       |
| `voice`    | `str`             | `None`  | Voice identifier. *(Inherited.)*       |
| `language` | `Language \| str` | `None`  | Language for synthesis. *(Inherited.)* |

## Usage

### Basic Setup

```python theme={null}
import aiohttp
from pipecat.services.xtts import XTTSService

async with aiohttp.ClientSession() as session:
    tts = XTTSService(
        settings=XTTSService.Settings(
            voice="Ana Florence",
        ),
        base_url="http://localhost:8000",
        aiohttp_session=session,
    )
```

### With Language Configuration

```python theme={null}
import aiohttp
from pipecat.services.xtts import XTTSService
from pipecat.transcriptions.language import Language

async with aiohttp.ClientSession() as session:
    tts = XTTSService(
        settings=XTTSService.Settings(
            voice="Ana Florence",
            language=Language.ES,
        ),
        base_url="http://localhost:8000",
        aiohttp_session=session,
    )
```

<Tip>
  The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
  `Settings` / `settings=` instead. See the [Service Settings
  guide](/pipecat/fundamentals/service-settings) for migration details.
</Tip>

## Notes

* **Local server required**: XTTS requires a locally running streaming server (via Docker). The service connects to this server over HTTP.
* **Studio speakers**: On startup, the service fetches available "studio speakers" from the server's `/studio_speakers` endpoint. The `voice_id` must match one of these speakers.
* **Audio resampling**: XTTS natively outputs audio at 24kHz. The service automatically resamples to match the pipeline's configured sample rate.
* **GPU recommended**: The XTTS server performs best with CUDA-enabled GPU acceleration. CPU inference is significantly slower.
* **No API key required**: XTTS runs locally, so no external API credentials are needed.
