XTTS-vLLM

Overview

XTTSVLLMTTSService streams audio from a self-hosted XTTSv2-vLLM streaming server — Coqui XTTSv2 served with vLLM for real-time, low-latency synthesis (~0.45s time-to-first-byte on the maintainer’s test hardware). It is a thin HTTP client: the heavy model server runs separately (as a Docker image, typically on a GPU host) and the service talks to it over an OpenAI-compatible streaming endpoint, outputting TTSAudioRawFrame audio into your Pipecat pipeline. Voice cloning conditioning is computed once from a short reference sample and cached for the lifetime of the service, so per-utterance requests stay fast.

Source Repository

Source code, examples, and issues for the XTTS-vLLM integration

PyPI Package

The pipecat-xtts-vllm package on PyPI

Model Server

The XTTSv2-vLLM streaming server this client connects to

Installation

This is a community-maintained package distributed separately from pipecat-ai:

uv add pipecat-xtts-vllm

Prerequisites

This service is a client for a self-hosted model server; there is no third-party account or API key.

Run the model server. Deploy the XTTSv2-vLLM streaming server (Docker image, GPU recommended) and note its URL for base_url. See the server repository for deployment instructions.
Provide a reference voice. A ~6-second reference audio clip (as bytes) is used for voice cloning. Alternatively, supply precomputed conditioning.

The integration code is MIT-licensed, but the underlying XTTSv2 model weights are distributed under the Coqui Public Model License (non-commercial use only). Review the server repository for licensing details before production use.

Configuration

str

required

Base URL of the running XTTSv2-vLLM streaming server, e.g. http://localhost:8000.

bytes

default:"None"

Reference voice sample (~6 seconds) used to compute voice-cloning conditioning. Required unless conditioning is provided.

XTTSVLLMConditioning

default:"None"

Optional precomputed conditioning (gpt_cond_latent_b64 + speaker_embeddings_b64). If set, it takes precedence over reference_audio and skips the conditioning request.

str

default:"en"

Language code for synthesis. XTTSv2 supports 17 languages: en (English), es (Spanish), fr (French), de (German), it (Italian), pt (Portuguese), pl (Polish), tr (Turkish), ru (Russian), nl (Dutch), cs (Czech), ar (Arabic), zh-cn (Chinese, Simplified), hu (Hungarian), ko (Korean), ja (Japanese), and hi (Hindi). Pass auto to let the server auto-detect the language.

int

default:"20"

Token-delta streaming chunk size sent to the server.

float

default:"1.0"

Speech speed multiplier.

int

default:"24000"

Output audio sample rate in Hz (XTTSv2 native is 24 kHz, 16-bit mono PCM).

aiohttp.ClientSession

default:"None"

Optional shared aiohttp session used for requests. If not provided, the service creates and manages its own session.

Usage

from pathlib import Path

from pipecat.pipeline.pipeline import Pipeline
from pipecat_xtts_vllm import XTTSVLLMTTSService

tts = XTTSVLLMTTSService(
    base_url="http://localhost:8000",
    reference_audio=Path("reference.wav").read_bytes(),
    language="en",
)

pipeline = Pipeline(
    [
        transport.input(),               # audio/user input
        stt,                             # speech to text
        context_aggregator.user(),       # add user text to context
        llm,                             # LLM generates response
        tts,                             # XTTS-vLLM synthesis
        transport.output(),              # stream audio back to user
        context_aggregator.assistant(),  # store assistant response
    ]
)

To reuse precomputed conditioning instead of a reference clip, import XTTSVLLMConditioning alongside the service (from pipecat_xtts_vllm import XTTSVLLMConditioning, XTTSVLLMTTSService) and pass it via the conditioning= argument. See the foundational example in the source repository for a complete, runnable script.

Compatibility

Tested with pipecat-ai v1.4.0. Check the source repository for the latest tested version and changelog.

Pipecat Server

Client SDKs

Pipecat Flows

Pipecat Cloud

CLI

Pipecat Context Hub

Overview

Source Repository

PyPI Package

Model Server

Installation

Prerequisites

Configuration

Usage

Compatibility

​Overview

Source Repository

PyPI Package

Model Server

​Installation

​Prerequisites

​Configuration

​Usage

​Compatibility

Overview

Installation

Prerequisites

Configuration

Usage

Compatibility