> ## Documentation Index > Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt > Use this file to discover all available pages before exploring further. # Soniox > Speech-to-text service implementation using Soniox's WebSocket API ## Overview `SonioxSTTService` provides real-time speech-to-text transcription using Soniox's WebSocket API with support for over 60 languages, custom context, multiple languages in the same conversation, and advanced features for accurate multilingual transcription. By default, Soniox uses the `stt-rt-v5` model with `vad_force_turn_endpoint=True`, which disables Soniox's native turn detection and relies on Pipecat's local VAD to finalize transcripts. This configuration significantly reduces the time to final segment (\~250ms median). Pipecat enables smart-turn detection by default using `LocalSmartTurnAnalyzerV3`. To use Soniox's native turn detection instead, set `vad_force_turn_endpoint=False`. Pipecat's API methods for Soniox STT integration Complete example with interruption handling Official Soniox documentation and features Access multilingual models and API keys ## Installation To use Soniox services, install the required dependencies: ```bash theme={null} uv add "pipecat-ai[soniox]" ``` ## Prerequisites ### Soniox Account Setup Before using Soniox STT services, you need: 1. **Soniox Account**: Sign up at [Soniox Console](https://console.soniox.com/) 2. **API Key**: Generate an API key from your console dashboard 3. **Language Selection**: Choose from 60+ supported languages and models ### Required Environment Variables * `SONIOX_API_KEY`: Your Soniox API key for authentication ## Configuration ### SonioxSTTService Soniox API key for authentication. Soniox WebSocket API URL. Audio sample rate in Hz. When `None`, uses the pipeline's configured sample rate. Soniox model to use for transcription. *Deprecated in v0.0.105. Use `settings=SonioxSTTService.Settings(model=...)` instead.* Audio format for transcription. Init-only -- not part of runtime-updatable settings. Number of audio channels. Init-only -- not part of runtime-updatable settings. Additional configuration parameters. *Deprecated in v0.0.105. Use `settings=SonioxSTTService.Settings(...)` instead.* Runtime-configurable settings for the STT service. See [Settings](#settings) below. P99 latency from speech end to final transcript in seconds. Override for your deployment. See [stt-benchmark](https://github.com/pipecat-ai/stt-benchmark). Listen to `VADUserStoppedSpeakingFrame` to send a finalize message to Soniox. When enabled, Pipecat's local VAD triggers transcript finalization. When disabled, Soniox detects the end of speech natively. ### Settings Runtime-configurable settings passed via the `settings` constructor argument using `SonioxSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details. | Parameter | Type | Default | Description | | -------------------------------- | ---------------------------- | ------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | | `model` | `str` | `"stt-rt-v5"` | Model to use for transcription. *(Inherited from base STT settings.)* | | `language` | `Language \| str` | `None` | Language for speech recognition. *(Inherited from base STT settings.)* | | `language_hints` | `list[Language]` | `None` | Language hints for transcription. Helps the model prioritize expected languages. | | `language_hints_strict` | `bool` | `None` | If true, strictly enforce language hints (only transcribe in provided languages). | | `context` | `SonioxContextObject \| str` | `None` | Customization for transcription. String for models with context\_version 1, `SonioxContextObject` for context\_version 2 (stt-rt-v3-preview and higher). | | `enable_speaker_diarization` | `bool` | `False` | Enable speaker diarization. Tokens are annotated with speaker IDs. | | `enable_language_identification` | `bool` | `False` | Enable language identification. Tokens are annotated with language IDs, and the detected language is included in the final `TranscriptionFrame`. | | `max_endpoint_delay_ms` | `int` | `None` | Maximum delay in milliseconds before endpoint detection finalizes the turn. Valid range: 500-3000. | | `endpoint_sensitivity` | `float` | `None` | Endpoint detection sensitivity (-1.0 to 1.0); higher values finalize turns sooner, lower values delay finalization. Introduced in the v5 model. | | `client_reference_id` | `str` | `None` | Client reference ID for transcription tracking. | ## Usage ### Basic Setup ```python theme={null} from pipecat.services.soniox.stt import SonioxSTTService stt = SonioxSTTService( api_key=os.getenv("SONIOX_API_KEY"), ) ``` ### With Language Hints and Context ```python theme={null} from pipecat.services.soniox.stt import SonioxSTTService from pipecat.transcriptions.language import Language stt = SonioxSTTService( api_key=os.getenv("SONIOX_API_KEY"), settings=SonioxSTTService.Settings( model="stt-rt-v5", language_hints=[Language.EN, Language.ES], language_hints_strict=True, enable_language_identification=True, ), ) ``` ### With Context Object (v3+ models) ```python theme={null} from pipecat.services.soniox.stt import ( SonioxSTTService, SonioxContextObject, SonioxContextGeneralItem, ) stt = SonioxSTTService( api_key=os.getenv("SONIOX_API_KEY"), settings=SonioxSTTService.Settings( model="stt-rt-v5", context=SonioxContextObject( general=[ SonioxContextGeneralItem(key="domain", value="medical"), ], terms=["Pipecat", "transcription"], ), ), ) ``` ### With Soniox Native Turn Detection ```python theme={null} from pipecat.services.soniox.stt import SonioxSTTService stt = SonioxSTTService( api_key=os.getenv("SONIOX_API_KEY"), vad_force_turn_endpoint=False, ) ``` ## Notes * **Turn finalization**: By default (`vad_force_turn_endpoint=True`), when Pipecat's VAD detects the user has stopped speaking, a finalize message is sent to Soniox to get the final transcript immediately. This significantly reduces latency. * **Keepalive**: The service automatically sends protocol-level keepalive messages to maintain the WebSocket connection. * **Context versions**: Use a string for `context` with older models (context\_version 1) and `SonioxContextObject` for newer models (stt-rt-v3-preview and higher, context\_version 2). See the [Soniox context documentation](https://soniox.com/docs/stt/concepts/context) for details. The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use `Settings` / `settings=` instead. See the [Service Settings guide](/pipecat/fundamentals/service-settings) for migration details. ## Event Handlers Soniox STT supports the standard [service connection events](/api-reference/server/events/service-events): | Event | Description | | ----------------- | ---------------------------------- | | `on_connected` | Connected to Soniox WebSocket | | `on_disconnected` | Disconnected from Soniox WebSocket | ```python theme={null} @stt.event_handler("on_connected") async def on_connected(service): print("Connected to Soniox") ```