> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Soniox

> Speech-to-text service implementation using Soniox's WebSocket API

## Overview

`SonioxSTTService` provides real-time speech-to-text transcription using Soniox's WebSocket API with support for over 60 languages, custom context, multiple languages in the same conversation, and advanced features for accurate multilingual transcription.

By default, Soniox uses the `stt-rt-v4` model with `vad_force_turn_endpoint=True`, which disables Soniox's native turn detection and relies on Pipecat's local VAD to finalize transcripts. This configuration significantly reduces the time to final segment (\~250ms median). Pipecat enables smart-turn detection by default using `LocalSmartTurnAnalyzerV3`. To use Soniox's native turn detection instead, set `vad_force_turn_endpoint=False`.

<CardGroup cols={2}>
  <Card title="Soniox STT API Reference" icon="code" href="https://pipecat-docs.readthedocs.io/en/latest/api/pipecat.services.soniox.stt.html">
    Pipecat's API methods for Soniox STT integration
  </Card>

  <Card title="Example Implementation" icon="play" href="https://github.com/pipecat-ai/pipecat/blob/main/examples/voice/voice-soniox.py">
    Complete example with interruption handling
  </Card>

  <Card title="Soniox Documentation" icon="book" href="https://soniox.com/docs/speech-to-text/get-started">
    Official Soniox documentation and features
  </Card>

  <Card title="Soniox Console" icon="microphone" href="https://console.soniox.com/">
    Access multilingual models and API keys
  </Card>
</CardGroup>

## Installation

To use Soniox services, install the required dependencies:

```bash theme={null}
uv add "pipecat-ai[soniox]"
```

## Prerequisites

### Soniox Account Setup

Before using Soniox STT services, you need:

1. **Soniox Account**: Sign up at [Soniox Console](https://console.soniox.com/)
2. **API Key**: Generate an API key from your console dashboard
3. **Language Selection**: Choose from 60+ supported languages and models

### Required Environment Variables

* `SONIOX_API_KEY`: Your Soniox API key for authentication

## Configuration

### SonioxSTTService

<ParamField path="api_key" type="str" required>
  Soniox API key for authentication.
</ParamField>

<ParamField path="url" type="str" default="wss://stt-rt.soniox.com/transcribe-websocket">
  Soniox WebSocket API URL.
</ParamField>

<ParamField path="sample_rate" type="int" default="None">
  Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
  rate.
</ParamField>

<ParamField path="model" type="str" default="None" deprecated>
  Soniox model to use for transcription. *Deprecated in v0.0.105. Use
  `settings=SonioxSTTService.Settings(model=...)` instead.*
</ParamField>

<ParamField path="audio_format" type="str" default="pcm_s16le">
  Audio format for transcription. Init-only -- not part of runtime-updatable
  settings.
</ParamField>

<ParamField path="num_channels" type="int" default="1">
  Number of audio channels. Init-only -- not part of runtime-updatable settings.
</ParamField>

<ParamField path="params" type="SonioxInputParams" default="None" deprecated>
  Additional configuration parameters. *Deprecated in v0.0.105. Use
  `settings=SonioxSTTService.Settings(...)` instead.*
</ParamField>

<ParamField path="settings" type="SonioxSTTService.Settings" default="None">
  Runtime-configurable settings for the STT service. See [Settings](#settings)
  below.
</ParamField>

<ParamField path="ttfs_p99_latency" type="float" default="0.35">
  P99 latency from speech end to final transcript in seconds. Override for your
  deployment. See [stt-benchmark](https://github.com/pipecat-ai/stt-benchmark).
</ParamField>

<ParamField path="vad_force_turn_endpoint" type="bool" default="True">
  Listen to `VADUserStoppedSpeakingFrame` to send a finalize message to Soniox.
  When enabled, Pipecat's local VAD triggers transcript finalization. When
  disabled, Soniox detects the end of speech natively.
</ParamField>

### Settings

Runtime-configurable settings passed via the `settings` constructor argument using `SonioxSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.

| Parameter                        | Type                         | Default       | Description                                                                                                                                              |
| -------------------------------- | ---------------------------- | ------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `model`                          | `str`                        | `"stt-rt-v4"` | Model to use for transcription. *(Inherited from base STT settings.)*                                                                                    |
| `language`                       | `Language \| str`            | `None`        | Language for speech recognition. *(Inherited from base STT settings.)*                                                                                   |
| `language_hints`                 | `list[Language]`             | `None`        | Language hints for transcription. Helps the model prioritize expected languages.                                                                         |
| `language_hints_strict`          | `bool`                       | `None`        | If true, strictly enforce language hints (only transcribe in provided languages).                                                                        |
| `context`                        | `SonioxContextObject \| str` | `None`        | Customization for transcription. String for models with context\_version 1, `SonioxContextObject` for context\_version 2 (stt-rt-v3-preview and higher). |
| `enable_speaker_diarization`     | `bool`                       | `False`       | Enable speaker diarization. Tokens are annotated with speaker IDs.                                                                                       |
| `enable_language_identification` | `bool`                       | `False`       | Enable language identification. Tokens are annotated with language IDs.                                                                                  |
| `client_reference_id`            | `str`                        | `None`        | Client reference ID for transcription tracking.                                                                                                          |

## Usage

### Basic Setup

```python theme={null}
from pipecat.services.soniox.stt import SonioxSTTService

stt = SonioxSTTService(
    api_key=os.getenv("SONIOX_API_KEY"),
)
```

### With Language Hints and Context

```python theme={null}
from pipecat.services.soniox.stt import SonioxSTTService
from pipecat.transcriptions.language import Language

stt = SonioxSTTService(
    api_key=os.getenv("SONIOX_API_KEY"),
    settings=SonioxSTTService.Settings(
        model="stt-rt-v4",
        language_hints=[Language.EN, Language.ES],
        language_hints_strict=True,
        enable_language_identification=True,
    ),
)
```

### With Context Object (v3+ models)

```python theme={null}
from pipecat.services.soniox.stt import (
    SonioxSTTService,
    SonioxContextObject,
    SonioxContextGeneralItem,
)

stt = SonioxSTTService(
    api_key=os.getenv("SONIOX_API_KEY"),
    settings=SonioxSTTService.Settings(
        model="stt-rt-v4",
        context=SonioxContextObject(
            general=[
                SonioxContextGeneralItem(key="domain", value="medical"),
            ],
            terms=["Pipecat", "transcription"],
        ),
    ),
)
```

### With Soniox Native Turn Detection

```python theme={null}
from pipecat.services.soniox.stt import SonioxSTTService

stt = SonioxSTTService(
    api_key=os.getenv("SONIOX_API_KEY"),
    vad_force_turn_endpoint=False,
)
```

## Notes

* **Turn finalization**: By default (`vad_force_turn_endpoint=True`), when Pipecat's VAD detects the user has stopped speaking, a finalize message is sent to Soniox to get the final transcript immediately. This significantly reduces latency.
* **Keepalive**: The service automatically sends protocol-level keepalive messages to maintain the WebSocket connection.
* **Context versions**: Use a string for `context` with older models (context\_version 1) and `SonioxContextObject` for newer models (stt-rt-v3-preview and higher, context\_version 2). See the [Soniox context documentation](https://soniox.com/docs/stt/concepts/context) for details.

<Tip>
  The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
  `Settings` / `settings=` instead. See the [Service Settings
  guide](/pipecat/fundamentals/service-settings) for migration details.
</Tip>

## Event Handlers

Soniox STT supports the standard [service connection events](/api-reference/server/events/service-events):

| Event             | Description                        |
| ----------------- | ---------------------------------- |
| `on_connected`    | Connected to Soniox WebSocket      |
| `on_disconnected` | Disconnected from Soniox WebSocket |

```python theme={null}
@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to Soniox")
```
