> ## Documentation Index > Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt > Use this file to discover all available pages before exploring further. # Sarvam > Speech-to-text service implementation using Sarvam AI's WebSocket-based streaming API ## Overview `SarvamSTTService` provides real-time speech recognition using Sarvam AI's WebSocket API, supporting Indian language transcription with Voice Activity Detection (VAD) and multiple audio formats for high-accuracy speech recognition. Pipecat's API methods for Sarvam STT integration Complete example with interruption handling Official Sarvam AI STT documentation and features Access API keys and speech models ## Installation To use Sarvam services, install the required dependency: ```bash theme={null} uv add "pipecat-ai[sarvam]" ``` ## Prerequisites ### Sarvam AI Account Setup Before using Sarvam STT services, you need: 1. **Sarvam AI Account**: Sign up at [Sarvam AI](https://dashboard.sarvam.ai/) 2. **API Key**: Generate an API key from your account dashboard 3. **Model Access**: Access to Saarika (STT) or Saaras (STT-Translate) models, including the `saaras:v3` model with support for multiple modes (transcribe, translate, verbatim, translit, codemix) ### Required Environment Variables * `SARVAM_API_KEY`: Your Sarvam AI API key for authentication ## Configuration ### SarvamSTTService Sarvam API key for authentication. Sarvam model to use. Allowed values: `"saarika:v2.5"` (standard STT), `"saaras:v2.5"` (STT-Translate, auto-detects language), `"saaras:v3"` (advanced, supports mode and fine-grained VAD). *Deprecated in v0.0.105. Use `settings=SarvamSTTService.Settings(...)` instead.* Audio sample rate in Hz. Defaults to 16000 if not specified. Mode of operation. Only applicable to models that support it (e.g., `saaras:v3`). Defaults to the model's default mode. Audio codec/format of the input file. Configuration parameters for Sarvam STT service. *Deprecated in v0.0.105. Use `settings=SarvamSTTService.Settings(...)` instead.* Runtime-configurable settings for the STT service. See [Settings](#settings) below. Seconds of no audio before sending silence to keep the connection alive. `None` disables keepalive. P99 latency from speech end to final transcript in seconds. Override for your deployment. See [stt-benchmark](https://github.com/pipecat-ai/stt-benchmark). Seconds between idle checks when keepalive is enabled. ### Settings Runtime-configurable settings passed via the `settings` constructor argument using `SarvamSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details. | Parameter | Type | Default | Description | | ------------------------------- | ----------------- | ------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `model` | `str` | `None` | STT model identifier. *(Inherited from base STT settings.)* | | `language` | `Language \| str` | `None` | Target language for transcription. *(Inherited from base STT settings.)* Behavior varies by model: `saarika:v2.5` defaults to "unknown" (auto-detect), `saaras:v2.5` ignores this (auto-detects), `saaras:v3` defaults to "en-IN". | | `prompt` | `str` | `None` | Optional prompt to guide transcription/translation style. Only applicable to `saaras:v2.5`. | | `vad_signals` | `bool` | `None` | Enable VAD signals in responses. When enabled, the service broadcasts `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` from the server. | | `high_vad_sensitivity` | `bool` | `None` | Enable high VAD sensitivity for more responsive speech detection. | | `positive_speech_threshold` | `float` | `None` | VAD probability threshold (0.0-1.0) above which a frame is considered speech. Only for `saaras:v3`. | | `negative_speech_threshold` | `float` | `None` | VAD probability threshold (0.0-1.0) below which a frame is considered silence. Only for `saaras:v3`. | | `min_speech_frames` | `int` | `None` | Minimum consecutive speech frames to start a speech segment. Only for `saaras:v3`. | | `first_turn_min_speech_frames` | `int` | `None` | Minimum speech frames for the first user turn. Only for `saaras:v3`. | | `negative_frames_count` | `int` | `None` | Number of silence frames within the window to end a speech segment. Only for `saaras:v3`. | | `negative_frames_window` | `int` | `None` | Sliding window size (in frames) for counting negative frames. Only for `saaras:v3`. | | `start_speech_volume_threshold` | `float` | `None` | Volume level (dB) below which audio is too quiet to be speech. Only for `saaras:v3`. | | `interrupt_min_speech_frames` | `int` | `None` | Minimum speech frames to register a barge-in/interruption. Only for `saaras:v3`. | | `pre_speech_pad_frames` | `int` | `None` | Number of audio frames to prepend before detected speech onset. Only for `saaras:v3`. | | `num_initial_ignored_frames` | `int` | `None` | Number of leading audio frames to skip at connection start. Only for `saaras:v3`. | ## Usage ### Basic Setup ```python theme={null} from pipecat.services.sarvam.stt import SarvamSTTService stt = SarvamSTTService( api_key=os.getenv("SARVAM_API_KEY"), ) ``` ### With Language and Model Configuration ```python theme={null} from pipecat.services.sarvam.stt import SarvamSTTService from pipecat.transcriptions.language import Language stt = SarvamSTTService( api_key=os.getenv("SARVAM_API_KEY"), mode="transcribe", settings=SarvamSTTService.Settings( model="saaras:v3", language=Language.HI_IN, prompt="Transcribe Hindi conversation about technology.", ), ) ``` ### With Server-Side VAD ```python theme={null} from pipecat.services.sarvam.stt import SarvamSTTService stt = SarvamSTTService( api_key=os.getenv("SARVAM_API_KEY"), settings=SarvamSTTService.Settings( vad_signals=True, high_vad_sensitivity=True, ), ) ``` ## Notes * **Default model changed**: As of this update, the default model is `saaras:v3` (previously `saarika:v2.5`). Applications that relied on the previous default should set `settings=SarvamSTTService.Settings(model="saarika:v2.5")` explicitly. * **Supported languages**: Bengali (bn-IN), Gujarati (gu-IN), Hindi (hi-IN), Kannada (kn-IN), Malayalam (ml-IN), Marathi (mr-IN), Tamil (ta-IN), Telugu (te-IN), Punjabi (pa-IN), Odia (od-IN), English (en-IN), and Assamese (as-IN). * **Model-specific parameter validation**: The service validates that parameters are compatible with the selected model. For example, `prompt` is only supported with `saaras:v2.5`, `language` is not supported with `saaras:v2.5` (which auto-detects language), and the fine-grained VAD parameters are only supported with `saaras:v3`. * **Fine-grained VAD tuning (saaras:v3 only)**: The `saaras:v3` model supports server-side VAD with 10 tuning parameters for speech detection thresholds, frame-count controls, pre-speech padding, interruption sensitivity, and initial-frame skipping. These parameters are only available with the `saaras:v3` model. * **VAD modes**: When `vad_signals=False` (default), the service relies on Pipecat's local VAD and flushes the server buffer on `VADUserStoppedSpeakingFrame`. When `vad_signals=True`, the service uses Sarvam's server-side VAD and broadcasts speaking frames from the server. The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use `Settings` / `settings=` instead. See the [Service Settings guide](/pipecat/fundamentals/service-settings) for migration details. ## Event Handlers In addition to the standard [service connection events](/api-reference/server/events/service-events) (`on_connected`, `on_disconnected`, `on_connection_error`), Sarvam STT provides: | Event | Description | | ------------------- | ----------------------------------- | | `on_speech_started` | Speech detected in the audio stream | | `on_speech_stopped` | Speech stopped | | `on_utterance_end` | End of utterance detected | ```python theme={null} @stt.event_handler("on_speech_started") async def on_speech_started(service): print("User started speaking") @stt.event_handler("on_utterance_end") async def on_utterance_end(service): print("Utterance ended") ```