Overview
NVIDIA Riva provides two STT service implementations:NvidiaSTTService for real-time streaming transcription using Parakeet models, and NvidiaSegmentedSTTService for segmented transcription using Canary models with advanced language support and enterprise-grade accuracy.
NVIDIA Riva STT API Reference
Pipecat’s API methods for NVIDIA Riva STT integration
Example Implementation
Complete example with NVIDIA services integration
NVIDIA Riva Documentation
Official NVIDIA Riva ASR documentation
NVIDIA Developer Portal
Access API keys and Riva services
Installation
To use NVIDIA Riva services, install the required dependency:Prerequisites
NVIDIA Riva Setup
Before using NVIDIA Riva STT services, you need:- NVIDIA Developer Account: Sign up at NVIDIA Developer Portal
- API Key: Generate an NVIDIA API key for Riva services
- Model Selection: Choose between Parakeet (streaming) and Canary (segmented) models
Required Environment Variables
NVIDIA_API_KEY: Your NVIDIA API key for authentication
Configuration
NvidiaSTTService
Real-time streaming transcription using NVIDIA Riva’s Parakeet models. Supports interim results and continuous audio processing.NVIDIA API key for authentication.
NVIDIA Riva server address.
Mapping containing
function_id and model_name for the ASR model.Audio sample rate in Hz. When
None, uses the pipeline’s configured sample rate.Configuration parameters. See NvidiaSTTService InputParams below.
Whether to use SSL for the gRPC connection.
NvidiaSTTService InputParams
| Parameter | Type | Default | Description |
|---|---|---|---|
language | Language | Language.EN_US | Target language for transcription. |
NvidiaSegmentedSTTService
Batch/segmented transcription using NVIDIA Riva’s Canary models. Processes complete audio segments after VAD detects speech boundaries.NVIDIA API key for authentication.
NVIDIA Riva server address.
Mapping containing
function_id and model_name for the ASR model.Audio sample rate in Hz. When
None, uses the pipeline’s configured sample rate.Configuration parameters. See NvidiaSegmentedSTTService InputParams below.
Whether to use SSL for the gRPC connection.
NvidiaSegmentedSTTService InputParams
| Parameter | Type | Default | Description |
|---|---|---|---|
language | Language | Language.EN_US | Target language for transcription. |
profanity_filter | bool | False | Whether to filter profanity from results. |
automatic_punctuation | bool | True | Whether to add automatic punctuation. |
verbatim_transcripts | bool | False | Whether to return verbatim transcripts. |
boosted_lm_words | list[str] | None | List of words to boost in the language model. |
boosted_lm_score | float | 4.0 | Score boost for specified words. |
Usage
Streaming with Parakeet
Segmented with Canary
Notes
- Model cannot be changed after initialization: Use the
model_function_mapparameter in the constructor to specify the model and function ID. - Streaming vs segmented:
NvidiaSTTServiceprovides real-time interim and final results through continuous streaming.NvidiaSegmentedSTTServiceprocesses complete audio segments for higher accuracy. - Language support: Supports Arabic, English (US/GB), French, German, Hindi, Italian, Japanese, Korean, Portuguese (BR), Russian, and Spanish (ES/US).
- Word boosting: Use
boosted_lm_wordsandboosted_lm_scorein the segmented service to improve recognition of domain-specific terms.