> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Cartesia

> Speech-to-text service implementations using Cartesia's real-time transcription APIs

## Overview

Cartesia provides two STT service implementations:

* `CartesiaSTTService` for real-time speech recognition using Cartesia's WebSocket API with the `ink-whisper` model, supporting streaming transcription with both interim and final results for low-latency applications
* `CartesiaTurnsSTTService` for turn-based speech recognition using Cartesia's v2 WebSocket API with the `ink-2` model, where the server drives turn boundaries and pushes structured events for turn lifecycle management including start, updates, eager end predictions, resume, and final turn completion

<CardGroup cols={2}>
  <Card title="Cartesia STT API Reference" icon="code" href="https://reference-server.pipecat.ai/en/latest/api/pipecat.services.cartesia.stt.html">
    Pipecat's API methods for Cartesia STT integration
  </Card>

  <Card title="Cartesia Turns STT API Reference" icon="code" href="https://reference-server.pipecat.ai/en/latest/api/pipecat.services.cartesia.turns.stt.html">
    Pipecat's API methods for Cartesia Turns STT integration
  </Card>

  <Card title="Standard STT Example" icon="play" href="https://github.com/pipecat-ai/pipecat/blob/main/examples/transcription/transcription-cartesia.py">
    Complete example with transcription logging
  </Card>

  <Card title="Turns STT Example" icon="play" href="https://github.com/pipecat-ai/pipecat/blob/main/examples/transcription/transcription-cartesia-turns.py">
    Complete example with turn-based transcription
  </Card>

  <Card title="Cartesia Documentation" icon="book" href="https://docs.cartesia.ai/api-reference/stt/stt">
    Official Cartesia STT documentation and features
  </Card>

  <Card title="Cartesia Platform" icon="microphone" href="https://cartesia.ai/">
    Access API keys and transcription models
  </Card>
</CardGroup>

## Installation

To use Cartesia services, install the required dependency:

```bash theme={null}
uv add "pipecat-ai[cartesia]"
```

## Prerequisites

### Cartesia Account Setup

Before using Cartesia STT services, you need:

1. **Cartesia Account**: Sign up at [Cartesia](https://cartesia.ai/)
2. **API Key**: Generate an API key from your account dashboard
3. **Model Access**: Ensure access to the transcription model you plan to use (`ink-whisper` for `CartesiaSTTService`, `ink-2` for `CartesiaTurnsSTTService`)

### Required Environment Variables

* `CARTESIA_API_KEY`: Your Cartesia API key for authentication

## CartesiaSTTService

<ParamField path="api_key" type="str" required>
  Cartesia API key for authentication.
</ParamField>

<ParamField path="base_url" type="str" default="">
  Custom API endpoint URL. If empty, defaults to `"api.cartesia.ai"`. Override
  for proxied deployments.
</ParamField>

<ParamField path="encoding" type="str" default="pcm_s16le">
  Audio encoding format.
</ParamField>

<ParamField path="sample_rate" type="int" default="None">
  Audio sample rate in Hz.
</ParamField>

<ParamField path="live_options" type="CartesiaLiveOptions | None" default="None" deprecated>
  Configuration options for the transcription service. *Deprecated in v0.0.105.
  Use `settings=CartesiaSTTService.Settings(...)` for model/language and direct
  init parameters for encoding/sample\_rate instead.*
</ParamField>

<ParamField path="settings" type="CartesiaSTTService.Settings" default="None">
  Runtime-configurable settings for the STT service. See [Settings](#settings)
  below.
</ParamField>

<ParamField path="ttfs_p99_latency" type="float" default="CARTESIA_TTFS_P99">
  P99 latency from speech end to final transcript in seconds. Override for your
  deployment.
</ParamField>

### Settings

Runtime-configurable settings passed via the `settings` constructor argument using `CartesiaSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`, which triggers an automatic reconnection with the new parameters. See [Service Settings](/pipecat/fundamentals/service-settings) for details.

| Parameter  | Type              | Default         | Description                                                              |
| ---------- | ----------------- | --------------- | ------------------------------------------------------------------------ |
| `model`    | `str`             | `"ink-whisper"` | The transcription model to use. *(Inherited from base STT settings.)*    |
| `language` | `Language \| str` | `"en"`          | Target language for transcription. *(Inherited from base STT settings.)* |

### Usage

#### Basic Setup

```python theme={null}
from pipecat.services.cartesia.stt import CartesiaSTTService

stt = CartesiaSTTService(
    api_key=os.getenv("CARTESIA_API_KEY"),
)
```

#### With Custom Options

```python theme={null}
from pipecat.services.cartesia.stt import CartesiaSTTService

stt = CartesiaSTTService(
    api_key=os.getenv("CARTESIA_API_KEY"),
    settings=CartesiaSTTService.Settings(
        model="ink-whisper",
        language="es",
    ),
    sample_rate=16000,
)
```

### Notes

* **Inactivity timeout**: Cartesia disconnects WebSocket connections after 3 minutes of inactivity. The timeout resets with each message sent. Silence-based keepalive is enabled by default to prevent disconnections.
* **Auto-reconnect on send**: If the connection is closed (e.g., due to timeout), the service automatically reconnects when the next audio data is sent.
* **Runtime settings updates**: Changing settings (e.g., `language` or `model`) via `STTUpdateSettingsFrame` triggers a reconnection with the new parameters. To avoid audio loss, reconnection is deferred until the current user turn ends (i.e., until `UserStoppedSpeakingFrame` is received). Audio frames arriving during the reconnect are buffered and replayed once the new connection is ready. This enables safe dynamic language switching mid-conversation.
* **Finalize on VAD stop**: When the pipeline's VAD detects the user has stopped speaking, the service sends a `"finalize"` command to flush the transcription session and produce a final result.

<Tip>
  The `InputParams` / `params=` / `live_options=` pattern is deprecated as of
  v0.0.105. Use `Settings` / `settings=` instead. See the [Service Settings
  guide](/pipecat/fundamentals/service-settings) for migration details.
</Tip>

### Event Handlers

Cartesia STT supports the standard [service connection events](/api-reference/server/events/service-events):

| Event             | Description                          |
| ----------------- | ------------------------------------ |
| `on_connected`    | Connected to Cartesia WebSocket      |
| `on_disconnected` | Disconnected from Cartesia WebSocket |

```python theme={null}
@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to Cartesia STT")
```

## CartesiaTurnsSTTService

The server drives turn boundaries with the `ink-2` model, pushing structured events for turn lifecycle management including start, updates, eager end predictions, resume, and final turn completion.

<ParamField path="api_key" type="str" required>
  Cartesia API key for authentication.
</ParamField>

<ParamField path="url" type="str" default="wss://api.cartesia.ai/stt/turns/websocket">
  WebSocket URL for the Cartesia Streaming ASR v2 endpoint.
</ParamField>

<ParamField path="sample_rate" type="int | None" default="None">
  Audio sample rate in Hz. If `None`, uses the pipeline sample rate.
</ParamField>

<ParamField path="should_interrupt" type="bool" default="True">
  Whether to broadcast an interruption when the server signals the start of a new turn.
</ParamField>

<ParamField path="watchdog_min_timeout" type="float" default="0.5">
  Minimum idle timeout (in seconds) before sending silence to prevent dangling turns. The actual threshold is `max(chunk_duration * 2, watchdog_min_timeout)`.
</ParamField>

<ParamField path="extra_headers" type="dict[str, str] | None" default="None">
  Optional additional HTTP headers to send with the WebSocket handshake.
</ParamField>

<ParamField path="settings" type="CartesiaTurnsSTTService.Settings" default="None">
  Runtime-updatable settings. See [Settings](#settings-2) below.
</ParamField>

### Settings

Runtime-configurable settings passed via the `settings` constructor argument using `CartesiaTurnsSTTService.Settings(...)`. The ink-2 model family is English-only and does not support runtime model or language switching. Attempts to update these fields will be reported as unhandled.

| Parameter  | Type              | Default   | Description                                                               |
| ---------- | ----------------- | --------- | ------------------------------------------------------------------------- |
| `model`    | `str`             | `"ink-2"` | The transcription model to use. *(Inherited from base STT settings.)*     |
| `language` | `Language \| str` | `None`    | Target language (fixed to English). *(Inherited from base STT settings.)* |

### Usage

#### Basic Setup

```python theme={null}
from pipecat.services.cartesia.turns.stt import CartesiaTurnsSTTService

stt = CartesiaTurnsSTTService(
    api_key=os.getenv("CARTESIA_API_KEY"),
)
```

#### With Custom Configuration

```python theme={null}
from pipecat.services.cartesia.turns.stt import CartesiaTurnsSTTService

stt = CartesiaTurnsSTTService(
    api_key=os.getenv("CARTESIA_API_KEY"),
    sample_rate=16000,
    should_interrupt=True,
    watchdog_min_timeout=1.0,
)
```

#### With Event Handlers

```python theme={null}
from pipecat.services.cartesia.turns.stt import CartesiaTurnsSTTService

stt = CartesiaTurnsSTTService(
    api_key=os.getenv("CARTESIA_API_KEY"),
)

@stt.event_handler("on_turn_start")
async def on_turn_start(service, transcript):
    print(f"User started speaking: {transcript}")

@stt.event_handler("on_turn_end")
async def on_turn_end(service, transcript):
    print(f"Final transcript: {transcript}")
```

### Turn-Based Protocol

The service speaks the v2 turn-based wire protocol:

```
connected → turn.start → turn.update* → (turn.eager_end → turn.resume?)* → turn.end → ...
```

* **`turn.start`**: Server detected the start of a turn. Pushes `UserStartedSpeakingFrame` and optionally broadcasts an interruption.
* **`turn.update`**: Incremental transcript update. Pushes `InterimTranscriptionFrame`.
* **`turn.eager_end`**: Server eagerly predicted the end of turn. Available via event handler for speculative downstream processing.
* **`turn.resume`**: User resumed speaking after an eager end. Available via event handler.
* **`turn.end`**: Final transcript for the completed turn. Pushes `TranscriptionFrame` and `UserStoppedSpeakingFrame`.

Transcripts are cumulative per turn. There is no `is_final` flag and no `finalize` command — closing the socket ends the session.

### Notes

* **English-only**: The ink-2 model family supports English transcription only at launch.
* **No runtime model switching**: Unlike the v1 API, the ink-2 model does not support runtime model or language switching.
* **Watchdog for dangling turns**: If audio stops flowing after a `turn.start`, the service sends silence to prevent the turn from hanging indefinitely. Configure the threshold with `watchdog_min_timeout`.
* **Server-driven turns**: The server controls turn boundaries. There is no client-side `finalize` command.
* **Interruption support**: Set `should_interrupt=True` to broadcast interruptions when the user starts speaking, enabling natural turn-taking.

### Event Handlers

Cartesia Turns STT supports the following event handlers:

| Event                 | Handler Signature                     | Description                              |
| --------------------- | ------------------------------------- | ---------------------------------------- |
| `on_connected`        | `async def(service)`                  | Connected to Cartesia WebSocket          |
| `on_disconnected`     | `async def(service)`                  | Disconnected from Cartesia WebSocket     |
| `on_connection_error` | `async def(service, error_msg)`       | Connection error occurred                |
| `on_turn_start`       | `async def(service, transcript: str)` | Server detected start of a turn          |
| `on_turn_update`      | `async def(service, transcript: str)` | Incremental transcript update            |
| `on_turn_eager_end`   | `async def(service, transcript: str)` | Server eagerly predicted end of turn     |
| `on_turn_resume`      | `async def(service)`                  | User resumed speaking after an eager end |
| `on_turn_end`         | `async def(service, transcript: str)` | Final transcript for the completed turn  |

Example:

```python theme={null}
@stt.event_handler("on_turn_eager_end")
async def on_turn_eager_end(service, transcript):
    print(f"Eager end prediction: {transcript}")
    # Optionally start processing speculatively

@stt.event_handler("on_turn_resume")
async def on_turn_resume(service):
    print("User resumed speaking, discard speculative processing")
```
