> ## Documentation Index > Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt > Use this file to discover all available pages before exploring further. # Cartesia > Speech-to-text service implementations using Cartesia's real-time transcription APIs ## Overview Cartesia provides two STT service implementations: * `CartesiaSTTService` for real-time speech recognition using Cartesia's WebSocket API with the `ink-whisper` model, supporting streaming transcription with both interim and final results for low-latency applications * `CartesiaTurnsSTTService` for turn-based speech recognition using Cartesia's v2 WebSocket API with the `ink-2` model, where the server drives turn boundaries and pushes structured events for turn lifecycle management including start, updates, eager end predictions, resume, and final turn completion Pipecat's API methods for Cartesia STT integration Pipecat's API methods for Cartesia Turns STT integration Complete example with transcription logging Complete example with turn-based transcription Official Cartesia STT documentation and features Access API keys and transcription models ## Installation To use Cartesia services, install the required dependency: ```bash theme={null} uv add "pipecat-ai[cartesia]" ``` ## Prerequisites ### Cartesia Account Setup Before using Cartesia STT services, you need: 1. **Cartesia Account**: Sign up at [Cartesia](https://cartesia.ai/) 2. **API Key**: Generate an API key from your account dashboard 3. **Model Access**: Ensure access to the transcription model you plan to use (`ink-whisper` for `CartesiaSTTService`, `ink-2` for `CartesiaTurnsSTTService`) ### Required Environment Variables * `CARTESIA_API_KEY`: Your Cartesia API key for authentication ## CartesiaSTTService Cartesia API key for authentication. Custom API endpoint URL. If empty, defaults to `"api.cartesia.ai"`. Override for proxied deployments. Audio encoding format. Audio sample rate in Hz. Configuration options for the transcription service. *Deprecated in v0.0.105. Use `settings=CartesiaSTTService.Settings(...)` for model/language and direct init parameters for encoding/sample\_rate instead.* Runtime-configurable settings for the STT service. See [Settings](#settings) below. P99 latency from speech end to final transcript in seconds. Override for your deployment. ### Settings Runtime-configurable settings passed via the `settings` constructor argument using `CartesiaSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`, which triggers an automatic reconnection with the new parameters. See [Service Settings](/pipecat/fundamentals/service-settings) for details. | Parameter | Type | Default | Description | | ---------- | ----------------- | --------------- | ------------------------------------------------------------------------ | | `model` | `str` | `"ink-whisper"` | The transcription model to use. *(Inherited from base STT settings.)* | | `language` | `Language \| str` | `"en"` | Target language for transcription. *(Inherited from base STT settings.)* | ### Usage #### Basic Setup ```python theme={null} from pipecat.services.cartesia.stt import CartesiaSTTService stt = CartesiaSTTService( api_key=os.getenv("CARTESIA_API_KEY"), ) ``` #### With Custom Options ```python theme={null} from pipecat.services.cartesia.stt import CartesiaSTTService stt = CartesiaSTTService( api_key=os.getenv("CARTESIA_API_KEY"), settings=CartesiaSTTService.Settings( model="ink-whisper", language="es", ), sample_rate=16000, ) ``` ### Notes * **Inactivity timeout**: Cartesia disconnects WebSocket connections after 3 minutes of inactivity. The timeout resets with each message sent. Silence-based keepalive is enabled by default to prevent disconnections. * **Auto-reconnect on send**: If the connection is closed (e.g., due to timeout), the service automatically reconnects when the next audio data is sent. * **Runtime settings updates**: Changing settings (e.g., `language` or `model`) via `STTUpdateSettingsFrame` triggers a reconnection with the new parameters. To avoid audio loss, reconnection is deferred until the current user turn ends (i.e., until `UserStoppedSpeakingFrame` is received). Audio frames arriving during the reconnect are buffered and replayed once the new connection is ready. This enables safe dynamic language switching mid-conversation. * **Finalize on VAD stop**: When the pipeline's VAD detects the user has stopped speaking, the service sends a `"finalize"` command to flush the transcription session and produce a final result. The `InputParams` / `params=` / `live_options=` pattern is deprecated as of v0.0.105. Use `Settings` / `settings=` instead. See the [Service Settings guide](/pipecat/fundamentals/service-settings) for migration details. ### Event Handlers Cartesia STT supports the standard [service connection events](/api-reference/server/events/service-events): | Event | Description | | ----------------- | ------------------------------------ | | `on_connected` | Connected to Cartesia WebSocket | | `on_disconnected` | Disconnected from Cartesia WebSocket | ```python theme={null} @stt.event_handler("on_connected") async def on_connected(service): print("Connected to Cartesia STT") ``` ## CartesiaTurnsSTTService The server drives turn boundaries with the `ink-2` model, pushing structured events for turn lifecycle management including start, updates, eager end predictions, resume, and final turn completion. Cartesia API key for authentication. WebSocket URL for the Cartesia Streaming ASR v2 endpoint. Audio sample rate in Hz. If `None`, uses the pipeline sample rate. Whether to broadcast an interruption when the server signals the start of a new turn. Minimum idle timeout (in seconds) before sending silence to prevent dangling turns. The actual threshold is `max(chunk_duration * 2, watchdog_min_timeout)`. Optional additional HTTP headers to send with the WebSocket handshake. Runtime-updatable settings. See [Settings](#settings-2) below. ### Settings Runtime-configurable settings passed via the `settings` constructor argument using `CartesiaTurnsSTTService.Settings(...)`. The ink-2 model family is English-only and does not support runtime model or language switching. Attempts to update these fields will be reported as unhandled. | Parameter | Type | Default | Description | | ---------- | ----------------- | --------- | ------------------------------------------------------------------------- | | `model` | `str` | `"ink-2"` | The transcription model to use. *(Inherited from base STT settings.)* | | `language` | `Language \| str` | `None` | Target language (fixed to English). *(Inherited from base STT settings.)* | ### Usage #### Basic Setup ```python theme={null} from pipecat.services.cartesia.turns.stt import CartesiaTurnsSTTService stt = CartesiaTurnsSTTService( api_key=os.getenv("CARTESIA_API_KEY"), ) ``` #### With Custom Configuration ```python theme={null} from pipecat.services.cartesia.turns.stt import CartesiaTurnsSTTService stt = CartesiaTurnsSTTService( api_key=os.getenv("CARTESIA_API_KEY"), sample_rate=16000, should_interrupt=True, watchdog_min_timeout=1.0, ) ``` #### With Event Handlers ```python theme={null} from pipecat.services.cartesia.turns.stt import CartesiaTurnsSTTService stt = CartesiaTurnsSTTService( api_key=os.getenv("CARTESIA_API_KEY"), ) @stt.event_handler("on_turn_start") async def on_turn_start(service, transcript): print(f"User started speaking: {transcript}") @stt.event_handler("on_turn_end") async def on_turn_end(service, transcript): print(f"Final transcript: {transcript}") ``` ### Turn-Based Protocol The service speaks the v2 turn-based wire protocol: ``` connected → turn.start → turn.update* → (turn.eager_end → turn.resume?)* → turn.end → ... ``` * **`turn.start`**: Server detected the start of a turn. Pushes `UserStartedSpeakingFrame` and optionally broadcasts an interruption. * **`turn.update`**: Incremental transcript update. Pushes `InterimTranscriptionFrame`. * **`turn.eager_end`**: Server eagerly predicted the end of turn. Available via event handler for speculative downstream processing. * **`turn.resume`**: User resumed speaking after an eager end. Available via event handler. * **`turn.end`**: Final transcript for the completed turn. Pushes `TranscriptionFrame` and `UserStoppedSpeakingFrame`. Transcripts are cumulative per turn. There is no `is_final` flag and no `finalize` command — closing the socket ends the session. ### Notes * **English-only**: The ink-2 model family supports English transcription only at launch. * **No runtime model switching**: Unlike the v1 API, the ink-2 model does not support runtime model or language switching. * **Watchdog for dangling turns**: If audio stops flowing after a `turn.start`, the service sends silence to prevent the turn from hanging indefinitely. Configure the threshold with `watchdog_min_timeout`. * **Server-driven turns**: The server controls turn boundaries. There is no client-side `finalize` command. * **Interruption support**: Set `should_interrupt=True` to broadcast interruptions when the user starts speaking, enabling natural turn-taking. ### Event Handlers Cartesia Turns STT supports the following event handlers: | Event | Handler Signature | Description | | --------------------- | ------------------------------------- | ---------------------------------------- | | `on_connected` | `async def(service)` | Connected to Cartesia WebSocket | | `on_disconnected` | `async def(service)` | Disconnected from Cartesia WebSocket | | `on_connection_error` | `async def(service, error_msg)` | Connection error occurred | | `on_turn_start` | `async def(service, transcript: str)` | Server detected start of a turn | | `on_turn_update` | `async def(service, transcript: str)` | Incremental transcript update | | `on_turn_eager_end` | `async def(service, transcript: str)` | Server eagerly predicted end of turn | | `on_turn_resume` | `async def(service)` | User resumed speaking after an eager end | | `on_turn_end` | `async def(service, transcript: str)` | Final transcript for the completed turn | Example: ```python theme={null} @stt.event_handler("on_turn_eager_end") async def on_turn_eager_end(service, transcript): print(f"Eager end prediction: {transcript}") # Optionally start processing speculatively @stt.event_handler("on_turn_resume") async def on_turn_resume(service): print("User resumed speaking, discard speculative processing") ```