Overview

SarvamTTSService converts text to speech using Sarvam AI’s TTS API. It specializes in Indian languages and provides extensive voice customization options including pitch, pace, and loudness control.

Installation

To use SarvamTTSService, no additional dependencies are required.

You’ll also need to set up your Sarvam AI API key as an environment variable: SARVAM_API_KEY

Configuration

Constructor Parameters

api_key
str
required

Your Sarvam AI API subscription key

voice_id
str
default:"anushka"

Speaker voice identifier (e.g., “anushka”, “meera”, “abhilash”)

model
str
default:"bulbul:v2"

TTS model to use (“bulbul:v1” or “bulbul:v2”)

aiohttp_session
aiohttp.ClientSession
required

Shared aiohttp session for making HTTP requests

base_url
str
default:"https://api.sarvam.ai"

Sarvam AI API base URL

sample_rate
int
default:"None"

Audio sample rate in Hz (8000, 16000, 22050, 24000)

params
InputParams
default:"None"

Additional voice and preprocessing parameters

InputParams Configuration

language
Language
default:"Language.HI"

Target language for synthesis

pitch
float
default:"0.0"

Voice pitch adjustment (-0.75 to 0.75)

pace
float
default:"1.0"

Speech speed (0.3 to 3.0)

loudness
float
default:"1.0"

Audio volume (0.1 to 3.0)

enable_preprocessing
bool
default:"False"

Enable text normalization for mixed-language content

Input

The service accepts text input through its TTS pipeline with automatic WAV header stripping for clean PCM output.

Output Frames

TTSStartedFrame

Signals the start of audio generation.

TTSAudioRawFrame

Contains generated audio data:

audio
bytes

Raw PCM audio data (WAV header stripped)

sample_rate
int

Audio sample rate (22050Hz default)

num_channels
int

Number of audio channels (1 for mono)

TTSStoppedFrame

Signals the completion of audio generation.

Methods

See the TTS base class methods for additional functionality.

Language Support

Sarvam AI TTS supports the following Indian languages:

Language CodeDescriptionService Code
Language.BNBengalibn-IN
Language.ENEnglish (India)en-IN
Language.GUGujaratigu-IN
Language.HIHindihi-IN
Language.KNKannadakn-IN
Language.MLMalayalamml-IN
Language.MRMarathimr-IN
Language.OROdiaod-IN
Language.PAPunjabipa-IN
Language.TATamilta-IN
Language.TETelugute-IN

Voice Models

See the Sarvam docs for the latest information on available voices and models.

Usage Example

from pipecat.services.sarvam.tts import SarvamTTSService
from pipecat.transcriptions.language import Language
import aiohttp

# Configure service
async with aiohttp.ClientSession() as session:
    tts = SarvamTTSService(
        api_key="your-api-key",
        voice_id="anushka",
        model="bulbul:v2",
        aiohttp_session=session,
        params=SarvamTTSService.InputParams(
            language=Language.HI,
        )
    )

    # Use in pipeline
    pipeline = Pipeline([
        ...,
        llm,
        tts,
        transport.output(),
    ])

Frame Flow

Metrics Support

The service supports metrics collection:

  • Time to First Byte (TTFB)
  • TTS usage metrics
  • Processing duration

Audio Processing

  • Returns base64-encoded WAV audio from API
  • Supports multiple sample rates (8000, 16000, 22050, 24000 Hz)
  • Generates mono audio output
  • Handles HTTP-based synthesis

Notes

  • Requires valid Sarvam AI API subscription key
  • Specializes in Indian languages and voices
  • Uses HTTP POST requests for synthesis
  • Thread-safe HTTP session management required