Sarvam AI
Text-to-speech service implementation using Sarvam AI’s TTS API
Overview
SarvamTTSService
converts text to speech using Sarvam AI’s TTS API. It specializes in Indian languages and provides extensive voice customization options including pitch, pace, and loudness control.
Installation
To use SarvamTTSService
, no additional dependencies are required.
You’ll also need to set up your Sarvam AI API key as an environment variable: SARVAM_API_KEY
Configuration
Constructor Parameters
Your Sarvam AI API subscription key
Speaker voice identifier (e.g., “anushka”, “meera”, “abhilash”)
TTS model to use (“bulbul:v1” or “bulbul:v2”)
Shared aiohttp session for making HTTP requests
Sarvam AI API base URL
Audio sample rate in Hz (8000, 16000, 22050, 24000)
Additional voice and preprocessing parameters
InputParams Configuration
Target language for synthesis
Voice pitch adjustment (-0.75 to 0.75)
Speech speed (0.3 to 3.0)
Audio volume (0.1 to 3.0)
Enable text normalization for mixed-language content
Input
The service accepts text input through its TTS pipeline with automatic WAV header stripping for clean PCM output.
Output Frames
TTSStartedFrame
Signals the start of audio generation.
TTSAudioRawFrame
Contains generated audio data:
Raw PCM audio data (WAV header stripped)
Audio sample rate (22050Hz default)
Number of audio channels (1 for mono)
TTSStoppedFrame
Signals the completion of audio generation.
Methods
See the TTS base class methods for additional functionality.
Language Support
Sarvam AI TTS supports the following Indian languages:
Language Code | Description | Service Code |
---|---|---|
Language.BN | Bengali | bn-IN |
Language.EN | English (India) | en-IN |
Language.GU | Gujarati | gu-IN |
Language.HI | Hindi | hi-IN |
Language.KN | Kannada | kn-IN |
Language.ML | Malayalam | ml-IN |
Language.MR | Marathi | mr-IN |
Language.OR | Odia | od-IN |
Language.PA | Punjabi | pa-IN |
Language.TA | Tamil | ta-IN |
Language.TE | Telugu | te-IN |
Voice Models
See the Sarvam docs for the latest information on available voices and models.
Usage Example
Frame Flow
Metrics Support
The service supports metrics collection:
- Time to First Byte (TTFB)
- TTS usage metrics
- Processing duration
Audio Processing
- Returns base64-encoded WAV audio from API
- Supports multiple sample rates (8000, 16000, 22050, 24000 Hz)
- Generates mono audio output
- Handles HTTP-based synthesis
Notes
- Requires valid Sarvam AI API subscription key
- Specializes in Indian languages and voices
- Uses HTTP POST requests for synthesis
- Thread-safe HTTP session management required