Sarvam AI

Overview

Sarvam AI provides text-to-speech synthesis specialized for Indian languages and voices. The service offers extensive voice customization options including pitch, pace, and loudness control, with support for multiple Indian languages and preprocessing for mixed-language content.

API Reference

Complete API documentation and method details

Sarvam AI Docs

Official Sarvam AI text-to-speech API documentation

Example Code

Working example with Indian language support

Installation

To use Sarvam AI services, no additional dependencies are required beyond the base installation:

pip install "pipecat-ai"

You’ll also need to set up your Sarvam AI API key as an environment variable: SARVAM_API_KEY.

Get your API key from the Sarvam AI Console.

Frames

Input

TextFrame - Text content to synthesize into speech
TTSSpeakFrame - Text that should be spoken immediately
TTSUpdateSettingsFrame - Runtime configuration updates
LLMFullResponseStartFrame / LLMFullResponseEndFrame - LLM response boundaries

Output

TTSStartedFrame - Signals start of synthesis
TTSAudioRawFrame - Generated audio data (PCM, WAV header stripped)
TTSStoppedFrame - Signals completion of synthesis
ErrorFrame - API or processing errors

Supported Sample Rates

8000 Hz - Phone quality
16000 Hz - Standard quality
22050 Hz - High quality
24000 Hz - Premium quality (default)

Language Support

Sarvam AI specializes in Indian languages with high-quality voice synthesis:

Language Code	Description	Service Code
`Language.BN`	Bengali	`bn-IN`
`Language.EN`	English (India)	`en-IN`
`Language.GU`	Gujarati	`gu-IN`
`Language.HI`	Hindi	`hi-IN`
`Language.KN`	Kannada	`kn-IN`
`Language.ML`	Malayalam	`ml-IN`
`Language.MR`	Marathi	`mr-IN`
`Language.OR`	Odia	`od-IN`
`Language.PA`	Punjabi	`pa-IN`
`Language.TA`	Tamil	`ta-IN`
`Language.TE`	Telugu	`te-IN`

TTS Models

bulbul:v1 - First generation model
bulbul:v2 - Enhanced model with improved quality (recommended)

Usage Example

Basic Configuration

Initialize the Sarvam TTS service with your API key and desired voice:

from pipecat.services.sarvam.tts import SarvamTTSService
from pipecat.transcriptions.language import Language
import aiohttp
import os

# Configure service with HTTP session
async with aiohttp.ClientSession() as session:
    tts = SarvamTTSService(
        api_key=os.getenv("SARVAM_API_KEY"),
        voice_id="anushka",
        model="bulbul:v2",
        aiohttp_session=session,
        params=SarvamTTSService.InputParams(
            language=Language.HI,
            pitch=0.1,
            pace=1.2,
            loudness=1.0
        )
    )

    # Use in pipeline
    pipeline = Pipeline([
        transport.input(),
        stt,
        context_aggregator.user(),
        llm,
        tts,
        transport.output(),
        context_aggregator.assistant()
    ])

Dynamic Configuration

Make settings updates by pushing a TTSUpdateSettingsFrame for the SarvamTTSService:

from pipecat.frames.frames import TTSUpdateSettingsFrame

await task.queue_frame(
    TTSUpdateSettingsFrame(settings={"voice": "your-new-voice-id"})
)

Metrics

The service provides comprehensive metrics:

Time to First Byte (TTFB) - Latency from text input to first audio
Processing Duration - Total synthesis time
Character Usage - Text processed for billing

Learn how to enable Metrics in your Pipeline.

Additional Notes

Language Specialization: Optimized for Indian languages with native voice quality
Voice Quality: High-quality synthesis with natural prosody for Indian languages

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

API Reference

Sarvam AI Docs

Example Code

Installation

Frames

Input

Output

Supported Sample Rates

Language Support

TTS Models

Usage Example

Basic Configuration

Dynamic Configuration

Metrics

Additional Notes

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

API Reference

Sarvam AI Docs

Example Code

​Installation

​Frames

​Input

​Output

​Supported Sample Rates

​Language Support

​TTS Models

​Usage Example

​Basic Configuration

​Dynamic Configuration

​Metrics

​Additional Notes

Overview

Installation

Frames

Input

Output

Supported Sample Rates

Language Support

TTS Models

Usage Example

Basic Configuration

Dynamic Configuration

Metrics

Additional Notes