> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Cartesia

> Text-to-speech services using Cartesia's WebSocket and HTTP APIs

## Overview

Cartesia provides high-quality text-to-speech synthesis with two service implementations: `CartesiaTTSService` (WebSocket-based) for real-time streaming with word timestamps, and `CartesiaHttpTTSService` (HTTP-based) for simpler batch synthesis. `CartesiaTTSService` is recommended for interactive applications requiring low latency and interruption handling.

<CardGroup cols={2}>
  <Card title="Cartesia TTS API Reference" icon="code" href="https://reference-server.pipecat.ai/en/latest/api/pipecat.services.cartesia.tts.html">
    Pipecat's API methods for Cartesia TTS integration
  </Card>

  <Card title="Example Implementation" icon="play" href="https://github.com/pipecat-ai/pipecat/blob/main/examples/voice/voice-cartesia.py">
    Complete example with interruption handling
  </Card>

  <Card title="Cartesia Documentation" icon="book" href="https://docs.cartesia.ai/get-started/overview">
    Official Cartesia API documentation and features
  </Card>

  <Card title="Voice Library" icon="microphone" href="https://play.cartesia.ai/">
    Browse and test available voices
  </Card>
</CardGroup>

## Installation

To use Cartesia services, install the required dependencies:

```bash theme={null}
uv add "pipecat-ai[cartesia]"
```

## Prerequisites

### Cartesia Account Setup

Before using Cartesia TTS services, you need:

1. **Cartesia Account**: Sign up at [Cartesia](https://play.cartesia.ai/sign-up)
2. **API Key**: Generate an API key from your account dashboard
3. **Voice Selection**: Choose voice IDs from the [voice library](https://play.cartesia.ai/)

### Required Environment Variables

* `CARTESIA_API_KEY`: Your Cartesia API key for authentication

## Configuration

### CartesiaTTSService

<ParamField path="api_key" type="str" required>
  Cartesia API key for authentication.
</ParamField>

<ParamField path="voice_id" type="str" required deprecated>
  ID of the voice to use for synthesis. *Deprecated in v0.0.105. Use
  `settings=CartesiaTTSService.Settings(voice=...)` instead.*
</ParamField>

<ParamField path="model" type="str" default="sonic-3" deprecated>
  TTS model to use. *Deprecated in v0.0.105. Use
  `settings=CartesiaTTSService.Settings(model=...)` instead.*
</ParamField>

<ParamField path="cartesia_version" type="str" default="2025-04-16">
  API version string for Cartesia service.
</ParamField>

<ParamField path="url" type="str" default="wss://api.cartesia.ai/tts/websocket">
  WebSocket endpoint URL.
</ParamField>

<ParamField path="sample_rate" type="int" default="None">
  Output audio sample rate in Hz. When `None`, uses the pipeline's configured
  sample rate.
</ParamField>

<ParamField path="encoding" type="str" default="pcm_s16le">
  Audio encoding format.
</ParamField>

<ParamField path="container" type="str" default="raw">
  Audio container format.
</ParamField>

<ParamField path="text_aggregation_mode" type="TextAggregationMode" default="TextAggregationMode.SENTENCE">
  Controls how incoming text is aggregated before synthesis. `SENTENCE`
  (default) buffers text until sentence boundaries, producing more natural
  speech. `TOKEN` streams tokens directly for lower latency. Import from
  `pipecat.services.tts_service`.
</ParamField>

<ParamField path="aggregate_sentences" type="bool" default="None" deprecated>
  *Deprecated in v0.0.104.* Use `text_aggregation_mode` instead.
</ParamField>

<ParamField path="params" type="InputParams" default="None" deprecated>
  *Deprecated in v0.0.105. Use `settings=CartesiaTTSService.Settings(...)`
  instead.*
</ParamField>

<ParamField path="settings" type="CartesiaTTSService.Settings" default="None">
  Runtime-configurable settings. See [Settings](#settings) below.
</ParamField>

### CartesiaHttpTTSService

The HTTP service accepts similar parameters to the WebSocket service, with these differences:

<ParamField path="base_url" type="str" default="https://api.cartesia.ai">
  HTTP API base URL (instead of `url` for WebSocket).
</ParamField>

<ParamField path="cartesia_version" type="str" default="2024-11-13">
  API version for HTTP service.
</ParamField>

<ParamField path="aiohttp_session" type="aiohttp.ClientSession" default="None">
  Optional aiohttp ClientSession for HTTP requests. If not provided, a session
  will be created and managed internally.
</ParamField>

The HTTP service does not accept `text_aggregation_mode` or `aggregate_sentences`.

### Settings

Runtime-configurable settings passed via the `settings` constructor argument using `CartesiaTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.

| Parameter               | Type               | Default     | Description                                                   |
| ----------------------- | ------------------ | ----------- | ------------------------------------------------------------- |
| `model`                 | `str`              | `None`      | TTS model identifier. *(Inherited from base settings.)*       |
| `voice`                 | `str`              | `None`      | Voice identifier. *(Inherited from base settings.)*           |
| `language`              | `Language \| str`  | `None`      | Language for synthesis. *(Inherited from base settings.)*     |
| `generation_config`     | `GenerationConfig` | `NOT_GIVEN` | Generation configuration for Sonic-3 models. See below.       |
| `pronunciation_dict_id` | `str`              | `NOT_GIVEN` | ID of the pronunciation dictionary for custom pronunciations. |

#### GenerationConfig (Sonic-3)

Configuration for Sonic-3 generation parameters:

| Parameter | Type    | Default | Description                                                                                           |
| --------- | ------- | ------- | ----------------------------------------------------------------------------------------------------- |
| `volume`  | `float` | `None`  | Volume multiplier. Valid range: \[0.5, 2.0].                                                          |
| `speed`   | `float` | `None`  | Speed multiplier. Valid range: \[0.6, 1.5].                                                           |
| `emotion` | `str`   | `None`  | Emotion string to guide tone (e.g., `"neutral"`, `"angry"`, `"excited"`). Over 60 emotions supported. |

## Usage

### Basic Setup

```python theme={null}
from pipecat.services.cartesia import CartesiaTTSService

tts = CartesiaTTSService(
    api_key=os.getenv("CARTESIA_API_KEY"),
    settings=CartesiaTTSService.Settings(
        voice="your-voice-id",
    ),
)
```

### With Sonic-3 Generation Config

```python theme={null}
from pipecat.services.cartesia.tts import GenerationConfig

tts = CartesiaTTSService(
    api_key=os.getenv("CARTESIA_API_KEY"),
    settings=CartesiaTTSService.Settings(
        voice="your-voice-id",
        model="sonic-3",
        generation_config=GenerationConfig(
            speed=1.1,
            emotion="excited",
        ),
    ),
)
```

### HTTP Service

```python theme={null}
from pipecat.services.cartesia import CartesiaHttpTTSService

tts = CartesiaHttpTTSService(
    api_key=os.getenv("CARTESIA_API_KEY"),
    settings=CartesiaHttpTTSService.Settings(
        voice="your-voice-id",
    ),
)
```

## Customizing Speech

`CartesiaTTSService` provides a set of helper methods for implementing Cartesia-specific customizations, meant to be used as part of text transformers. These include methods for spelling out text, adjusting speech rate, and modifying pitch. See the [Text Transformers for TTS](/pipecat/learn/text-to-speech#text-transformers-for-tts) section in the Text-to-Speech guide for usage examples.

### SPELL(text: str) -> str:

A convenience method to wrap text in [Cartesia's spell tag](https://docs.cartesia.ai/build-with-cartesia/sonic-3/ssml-tags#spelling-out-numbers-and-letters) for spelling out text character by character.

```python theme={null}
# Text transformers for TTS
# This will insert Cartesia's spell tags around the provided text.
async def spell_out_text(text: str, type: str) -> str:
    return CartesiaTTSService.SPELL(text)

tts = CartesiaTTSService(
    api_key=os.getenv("CARTESIA_API_KEY"),
    text_transforms=[
        ("phone_number", spell_out_text),
    ],
)
```

### EMOTION\_TAG(emotion: CartesiaEmotion) -> str:

A convenience method to create an [emotion tag](https://docs.cartesia.ai/build-with-cartesia/sonic-3/volume-speed-emotion#emotion-controls-beta) for expressing emotions in speech.

```python theme={null}
# Text transformers for TTS
# This will insert Cartesia's sarcasm tag in front of any sentence that is just "whatever".
async def maybe_insert_sarcasm(text: str, type: str) -> str:
    if text.strip(".!").lower() == "whatever":
        return CartesiaTTSService.EMOTION_TAG(CartesiaEmotion.SARCASM) + text + CartesiaTTSService.EMOTION_TAG(CartesiaEmotion.NEUTRAL)
    return text

tts = CartesiaTTSService(
    api_key=os.getenv("CARTESIA_API_KEY"),
    text_transforms=[
        ("sentence", maybe_insert_sarcasm),
    ],
)
```

### PAUSE\_TAG(seconds: float) -> str:

A convenience method to create Cartesia's [SSML tag for inserting pauses](https://docs.cartesia.ai/build-with-cartesia/sonic-3/ssml-tags#pauses-and-breaks) in speech.

```python theme={null}
# Text transformers for TTS
# This will insert a one second pause after questions.
async def pause_after_questions(text: str, type: str) -> str:
    if text.endswith("?"):
        return f"{text}{CartesiaTTSService.PAUSE_TAG(1.0)}"
    return text

tts = CartesiaTTSService(
    api_key=os.getenv("CARTESIA_API_KEY"),
    text_transforms=[
        ("sentence", pause_after_questions), # Only apply to sentence aggregations
    ],
)
```

### VOLUME\_TAG(volume: float) -> str:

A convenience method to create Cartesia's [SSML volume tag](https://docs.cartesia.ai/build-with-cartesia/sonic-3/ssml-tags#volume) for dynamically adjusting speech volume in situ.

```python theme={null}
# Text transformers for TTS
# This will increase the volume for any full text aggregation that is in all caps.
async def maybe_say_it_loud(text: str, type: str) -> str:
    if text.upper() == text:
        return f"{CartesiaTTSService.VOLUME_TAG(2.0)}{text}{CartesiaTTSService.VOLUME_TAG(1.0)}"
    return text

tts = CartesiaTTSService(
    api_key=os.getenv("CARTESIA_API_KEY"),
    text_transforms=[
        ("*", maybe_say_it_loud), # Apply to all text
    ],
)
```

### SPEED\_TAG(speed: float) -> str:

A convenience method to create Cartesia's [SSML speed tag](https://docs.cartesia.ai/build-with-cartesia/sonic-3/ssml-tags#speed) for dynamically adjusting the speech rate in situ.

```python theme={null}
# Text transformers for TTS
# This will make the word slow always be spoken more slowly.
async def slow_down_slow_words(text: str, type: str) -> str:
    return text.replace(
        "slow",
        f"{CartesiaTTSService.SPEED_TAG(0.6)}slow{CartesiaTTSService.SPEED_TAG(1.0)}"
    )

tts = CartesiaTTSService(
    api_key=os.getenv("CARTESIA_API_KEY"),
    text_transforms=[
        ("*", slow_down_slow_words), # Apply to all text
    ],
)
```

<Tip>
  The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
  `Settings` / `settings=` instead. See the [Service Settings
  guide](/pipecat/fundamentals/service-settings) for migration details.
</Tip>

## Notes

* **WebSocket vs HTTP**: The WebSocket service supports word-level timestamps, audio context management, and interruption handling, making it better for interactive conversations. The HTTP service is simpler but lacks these features.
* **Text aggregation**: Sentence aggregation is enabled by default (`text_aggregation_mode=TextAggregationMode.SENTENCE`). Buffering until sentence boundaries produces more natural speech. Set `text_aggregation_mode=TextAggregationMode.TOKEN` to stream tokens directly for lower latency. Cartesia handles token streaming well.
* **Connection timeout**: Cartesia WebSocket connections time out after 5 minutes of inactivity (no keepalive mechanism is available). The service automatically reconnects when needed.
* **CJK language support**: For Chinese, Japanese, and Korean, the service combines individual characters from timestamp messages into meaningful word units.

## Event Handlers

Cartesia TTS supports the standard [service connection events](/api-reference/server/events/service-events):

| Event                 | Description                          |
| --------------------- | ------------------------------------ |
| `on_connected`        | Connected to Cartesia WebSocket      |
| `on_disconnected`     | Disconnected from Cartesia WebSocket |
| `on_connection_error` | WebSocket connection error occurred  |

```python theme={null}
@tts.event_handler("on_connected")
async def on_connected(service):
    print("Connected to Cartesia")
```
