> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Sarvam AI

> Text-to-speech service implementation using Sarvam AI's TTS API

## Overview

`SarvamTTSService` provides text-to-speech synthesis specialized for Indian languages and voices. The service offers extensive voice customization options including pitch, pace, and loudness control, with support for multiple Indian languages and preprocessing for mixed-language content. The `bulbul:v3-beta` model adds temperature control and 25 new speaker voices.

<CardGroup cols={2}>
  <Card title="Sarvam TTS API Reference" icon="code" href="https://reference-server.pipecat.ai/en/latest/api/pipecat.services.sarvam.tts.html">
    Pipecat's API methods for Sarvam AI TTS integration
  </Card>

  <Card title="Example Implementation" icon="play" href="https://github.com/pipecat-ai/pipecat/blob/main/examples/voice/voice-sarvam.py">
    Complete example with Indian language support
  </Card>

  <Card title="Sarvam Documentation" icon="book" href="https://docs.sarvam.ai/api-reference-docs/text-to-speech/convert">
    Official Sarvam AI text-to-speech API documentation
  </Card>

  <Card title="Sarvam Console" icon="microphone" href="https://www.sarvam.ai/">
    Access Indian language voices and API keys
  </Card>
</CardGroup>

## Installation

To use Sarvam AI services, no additional dependencies are required beyond the base installation:

```bash theme={null}
uv add "pipecat-ai"
```

## Prerequisites

### Sarvam AI Account Setup

Before using Sarvam AI TTS services, you need:

1. **Sarvam AI Account**: Sign up at [Sarvam AI Console](https://www.sarvam.ai/)
2. **API Key**: Generate an API key from your account dashboard
3. **Language Selection**: Choose from available Indian language voices

### Required Environment Variables

* `SARVAM_API_KEY`: Your Sarvam AI API key for authentication

## Configuration

Sarvam offers two service implementations: `SarvamTTSService` (WebSocket) for real-time streaming and `SarvamHttpTTSService` (HTTP) for simpler batch synthesis.

### SarvamTTSService

<ParamField path="api_key" type="str" required>
  Sarvam AI API subscription key.
</ParamField>

<ParamField path="model" type="str" default="bulbul:v2" deprecated>
  TTS model to use. Options: `bulbul:v2`, `bulbul:v3-beta`, `bulbul:v3`.
  *Deprecated in v0.0.105. Use `settings=SarvamTTSService.Settings(model=...)`
  instead.*
</ParamField>

<ParamField path="voice_id" type="str" default="None" deprecated>
  Speaker voice ID. If `None`, uses the model-appropriate default (`anushka` for
  v2, `shubh` for v3). *Deprecated in v0.0.105. Use
  `settings=SarvamTTSService.Settings(voice=...)` instead.*
</ParamField>

<ParamField path="url" type="str" default="wss://api.sarvam.ai/text-to-speech/ws">
  WebSocket URL for the TTS backend.
</ParamField>

<ParamField path="text_aggregation_mode" type="TextAggregationMode" default="TextAggregationMode.SENTENCE">
  Controls how incoming text is aggregated before synthesis. `SENTENCE`
  (default) buffers text until sentence boundaries, producing more natural
  speech. `TOKEN` streams tokens directly for lower latency. Import from
  `pipecat.services.tts_service`.
</ParamField>

<ParamField path="aggregate_sentences" type="bool" default="None" deprecated>
  *Deprecated in v0.0.104.* Use `text_aggregation_mode` instead.
</ParamField>

<ParamField path="sample_rate" type="int" default="None">
  Audio sample rate in Hz (8000, 16000, 22050, 24000). If `None`, uses
  model-specific default (22050 for v2, 24000 for v3).
</ParamField>

<ParamField path="params" type="InputParams" default="None" deprecated>
  *Deprecated in v0.0.105. Use `settings=SarvamTTSService.Settings(...)`
  instead.*
</ParamField>

<ParamField path="settings" type="SarvamTTSService.Settings" default="None">
  Runtime-configurable settings. See [SarvamTTSService
  Settings](#sarvamttsservice-settings) below.
</ParamField>

### SarvamHttpTTSService

<ParamField path="api_key" type="str" required>
  Sarvam AI API subscription key.
</ParamField>

<ParamField path="aiohttp_session" type="aiohttp.ClientSession" required>
  An aiohttp session for HTTP requests.
</ParamField>

<ParamField path="model" type="str" default="bulbul:v2" deprecated>
  TTS model to use. Options: `bulbul:v2`, `bulbul:v3-beta`, `bulbul:v3`.
  *Deprecated in v0.0.105. Use
  `settings=SarvamHttpTTSService.Settings(model=...)` instead.*
</ParamField>

<ParamField path="voice_id" type="str" default="None" deprecated>
  Speaker voice ID. If `None`, uses the model-appropriate default. *Deprecated
  in v0.0.105. Use `settings=SarvamHttpTTSService.Settings(voice=...)` instead.*
</ParamField>

<ParamField path="base_url" type="str" default="https://api.sarvam.ai">
  Sarvam AI API base URL.
</ParamField>

<ParamField path="sample_rate" type="int" default="None">
  Audio sample rate in Hz (8000, 16000, 22050, 24000). If `None`, uses
  model-specific default.
</ParamField>

<ParamField path="params" type="InputParams" default="None" deprecated>
  *Deprecated in v0.0.105. Use `settings=SarvamHttpTTSService.Settings(...)`
  instead.*
</ParamField>

<ParamField path="settings" type="SarvamHttpTTSService.Settings" default="None">
  Runtime-configurable settings. See [SarvamHttpTTSService
  Settings](#sarvamhttpttsservice-settings) below.
</ParamField>

#### SarvamTTSService Settings

Runtime-configurable settings passed via the `settings` constructor argument using `SarvamTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.

| Parameter              | Type              | Default     | Description                            |
| ---------------------- | ----------------- | ----------- | -------------------------------------- |
| `model`                | `str`             | `None`      | Model identifier. *(Inherited.)*       |
| `voice`                | `str`             | `None`      | Voice identifier. *(Inherited.)*       |
| `language`             | `Language \| str` | `None`      | Language for synthesis. *(Inherited.)* |
| `enable_preprocessing` | `bool`            | `NOT_GIVEN` | Enable text preprocessing.             |
| `pace`                 | `float`           | `NOT_GIVEN` | Pace of speech.                        |
| `pitch`                | `float`           | `NOT_GIVEN` | Pitch of speech.                       |
| `loudness`             | `float`           | `NOT_GIVEN` | Loudness of speech.                    |
| `temperature`          | `float`           | `NOT_GIVEN` | Temperature for speech synthesis.      |
| `min_buffer_size`      | `int`             | `NOT_GIVEN` | Minimum buffer size for WebSocket.     |
| `max_chunk_length`     | `int`             | `NOT_GIVEN` | Maximum chunk length for WebSocket.    |

#### SarvamHttpTTSService Settings

Runtime-configurable settings passed via the `settings` constructor argument using `SarvamHttpTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.

| Parameter              | Type              | Default     | Description                            |
| ---------------------- | ----------------- | ----------- | -------------------------------------- |
| `model`                | `str`             | `None`      | Model identifier. *(Inherited.)*       |
| `voice`                | `str`             | `None`      | Voice identifier. *(Inherited.)*       |
| `language`             | `Language \| str` | `None`      | Language for synthesis. *(Inherited.)* |
| `enable_preprocessing` | `bool`            | `NOT_GIVEN` | Enable text preprocessing.             |
| `pace`                 | `float`           | `NOT_GIVEN` | Pace of speech.                        |
| `pitch`                | `float`           | `NOT_GIVEN` | Pitch of speech.                       |
| `loudness`             | `float`           | `NOT_GIVEN` | Loudness of speech.                    |
| `temperature`          | `float`           | `NOT_GIVEN` | Temperature for speech synthesis.      |

## Usage

### Basic Setup (WebSocket)

```python theme={null}
from pipecat.services.sarvam import SarvamTTSService
from pipecat.transcriptions.language import Language

tts = SarvamTTSService(
    api_key=os.getenv("SARVAM_API_KEY"),
    settings=SarvamTTSService.Settings(
        voice="anushka",
        model="bulbul:v2",
        language=Language.HI,
    ),
)
```

### With v3 Model and Temperature Control

```python theme={null}
from pipecat.services.sarvam import SarvamTTSService
from pipecat.transcriptions.language import Language

tts = SarvamTTSService(
    api_key=os.getenv("SARVAM_API_KEY"),
    settings=SarvamTTSService.Settings(
        voice="aditya",
        model="bulbul:v3-beta",
        language=Language.HI,
        pace=1.2,
        temperature=0.8,
    ),
)
```

### HTTP Service

```python theme={null}
import aiohttp
from pipecat.services.sarvam import SarvamHttpTTSService
from pipecat.transcriptions.language import Language

async with aiohttp.ClientSession() as session:
    tts = SarvamHttpTTSService(
        api_key=os.getenv("SARVAM_API_KEY"),
        aiohttp_session=session,
        settings=SarvamHttpTTSService.Settings(
            voice="anushka",
            model="bulbul:v2",
            language=Language.HI,
            pitch=0.1,
            pace=1.2,
            loudness=1.5,
        ),
    )
```

<Tip>
  The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
  `Settings` / `settings=` instead. See the [Service Settings
  guide](/pipecat/fundamentals/service-settings) for migration details.
</Tip>

## Notes

* **Model differences**: `bulbul:v2` supports pitch and loudness control; `bulbul:v3-beta` and `bulbul:v3` add temperature control but do not support pitch or loudness. Setting unsupported parameters for a model will log a warning.
* **Default speakers vary by model**: v2 defaults to `anushka`; v3 models default to `shubh`.
* **Default sample rates vary by model**: v2 defaults to 22050 Hz; v3 models default to 24000 Hz.
* **Indian language focus**: Sarvam AI specializes in Indian languages, supporting Bengali, English (India), Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, and Telugu.
* **Pace ranges differ**: `bulbul:v2` supports pace from 0.3 to 3.0, while v3 models support 0.5 to 2.0. Values outside the range are clamped automatically.

## Event Handlers

Sarvam WebSocket TTS supports the standard [service connection events](/api-reference/server/events/service-events):

| Event                 | Description                         |
| --------------------- | ----------------------------------- |
| `on_connected`        | Connected to Sarvam WebSocket       |
| `on_disconnected`     | Disconnected from Sarvam WebSocket  |
| `on_connection_error` | WebSocket connection error occurred |

```python theme={null}
@tts.event_handler("on_connected")
async def on_connected(service):
    print("Connected to Sarvam")
```
