Overview

Sarvam AI provides text-to-speech synthesis specialized for Indian languages and voices. The service offers extensive voice customization options including pitch, pace, and loudness control, with support for multiple Indian languages and preprocessing for mixed-language content.

Installation

To use Sarvam AI services, no additional dependencies are required beyond the base installation:
pip install "pipecat-ai"
You’ll also need to set up your Sarvam AI API key as an environment variable: SARVAM_API_KEY.
Get your API key from the Sarvam AI Console.

Frames

Input

  • TextFrame - Text content to synthesize into speech
  • TTSSpeakFrame - Text that should be spoken immediately
  • TTSUpdateSettingsFrame - Runtime configuration updates
  • LLMFullResponseStartFrame / LLMFullResponseEndFrame - LLM response boundaries

Output

  • TTSStartedFrame - Signals start of synthesis
  • TTSAudioRawFrame - Generated audio data (PCM, WAV header stripped)
  • TTSStoppedFrame - Signals completion of synthesis
  • ErrorFrame - API or processing errors

Supported Sample Rates

  • 8000 Hz - Phone quality
  • 16000 Hz - Standard quality
  • 22050 Hz - High quality
  • 24000 Hz - Premium quality (default)

Language Support

Sarvam AI specializes in Indian languages with high-quality voice synthesis:
Language CodeDescriptionService Code
Language.BNBengalibn-IN
Language.ENEnglish (India)en-IN
Language.GUGujaratigu-IN
Language.HIHindihi-IN
Language.KNKannadakn-IN
Language.MLMalayalamml-IN
Language.MRMarathimr-IN
Language.OROdiaod-IN
Language.PAPunjabipa-IN
Language.TATamilta-IN
Language.TETelugute-IN

TTS Models

  • bulbul:v1 - First generation model
  • bulbul:v2 - Enhanced model with improved quality (recommended)

Usage Example

Basic Configuration

Initialize the Sarvam TTS service with your API key and desired voice:
from pipecat.services.sarvam.tts import SarvamTTSService
from pipecat.transcriptions.language import Language
import aiohttp
import os

# Configure service with HTTP session
async with aiohttp.ClientSession() as session:
    tts = SarvamTTSService(
        api_key=os.getenv("SARVAM_API_KEY"),
        voice_id="anushka",
        model="bulbul:v2",
        aiohttp_session=session,
        params=SarvamTTSService.InputParams(
            language=Language.HI,
            pitch=0.1,
            pace=1.2,
            loudness=1.0
        )
    )

    # Use in pipeline
    pipeline = Pipeline([
        transport.input(),
        stt,
        context_aggregator.user(),
        llm,
        tts,
        transport.output(),
        context_aggregator.assistant()
    ])

Dynamic Configuration

Make settings updates by pushing a TTSUpdateSettingsFrame for the SarvamTTSService:
from pipecat.frames.frames import TTSUpdateSettingsFrame

await task.queue_frame(TTSUpdateSettingsFrame(
    voice_id="meera",
  )
)

Metrics

The service provides comprehensive metrics:
  • Time to First Byte (TTFB) - Latency from text input to first audio
  • Processing Duration - Total synthesis time
  • Character Usage - Text processed for billing
Learn how to enable Metrics in your Pipeline.

Additional Notes

  • Language Specialization: Optimized for Indian languages with native voice quality
  • Voice Quality: High-quality synthesis with natural prosody for Indian languages