NVIDIA FastPitch
Text-to-speech service implementation using NVIDIA’s FastPitch model
Overview
FastPitchTTSService
converts text to speech using NVIDIA’s Riva FastPitch TTS model. It provides high-quality text-to-speech synthesis with configurable voice options.
Installation
To use FastPitchTTSService
, install the required dependencies:
You’ll also need to set up your NVIDIA API key as an environment variable: NVIDIA_API_KEY
Configuration
Constructor Parameters
Your NVIDIA API key
NVIDIA Riva server address
Voice identifier to use for synthesis
Output audio sample rate in Hz
NVIDIA function identifier for the TTS service
Additional configuration parameters (language and quality)
InputParams
The language for TTS generation
Quality level for the generated audio
Input
The service accepts text input through its TTS pipeline.
Output Frames
TTSStartedFrame
Signals the start of audio generation.
TTSAudioRawFrame
Contains generated audio data:
Raw audio data chunk
Audio sample rate
Number of audio channels (1 for mono)
TTSStoppedFrame
Signals the completion of audio generation.
Methods
See the TTS base class methods for additional functionality.
Language Support
FastPitch TTS primarily supports English with various regional accents:
Language Code | Description | Service Codes |
---|---|---|
Language.EN_US | English (US) | en-US |
Usage Example
Frame Flow
Metrics Support
The service supports metrics collection:
- Time to First Byte (TTFB)
- TTS usage metrics
- Processing duration
Audio Processing
- Processes audio through the Riva API
- Generates mono audio output
- Handles asynchronous audio streaming
- Configurable sampling rate
Notes
- Uses NVIDIA’s Riva AI Services platform
- Streams audio in chunks
- Requires valid NVIDIA API key
- Thread-safe processing with asyncio