Skip to main content

Overview

NVIDIA Riva provides two STT service implementations: NvidiaSTTService for real-time streaming transcription using Parakeet models, and NvidiaSegmentedSTTService for segmented transcription using Canary models with advanced language support and enterprise-grade accuracy.

Installation

To use NVIDIA Riva services, install the required dependency:
pip install "pipecat-ai[nvidia]"

Prerequisites

NVIDIA Riva Setup

Before using NVIDIA Riva STT services, you need:
  1. NVIDIA Developer Account: Sign up at NVIDIA Developer Portal
  2. API Key: Generate an NVIDIA API key for Riva services
  3. Model Selection: Choose between Parakeet (streaming) and Canary (segmented) models

Required Environment Variables

  • NVIDIA_API_KEY: Your NVIDIA API key for authentication

Configuration

NvidiaSTTService

Real-time streaming transcription using NVIDIA Riva’s Parakeet models. Supports interim results and continuous audio processing.
api_key
str
required
NVIDIA API key for authentication.
server
str
default:"grpc.nvcf.nvidia.com:443"
NVIDIA Riva server address.
model_function_map
Mapping[str, str]
Mapping containing function_id and model_name for the ASR model.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
params
InputParams
default:"None"
Configuration parameters. See NvidiaSTTService InputParams below.
use_ssl
bool
default:"True"
Whether to use SSL for the gRPC connection.

NvidiaSTTService InputParams

ParameterTypeDefaultDescription
languageLanguageLanguage.EN_USTarget language for transcription.

NvidiaSegmentedSTTService

Batch/segmented transcription using NVIDIA Riva’s Canary models. Processes complete audio segments after VAD detects speech boundaries.
api_key
str
required
NVIDIA API key for authentication.
server
str
default:"grpc.nvcf.nvidia.com:443"
NVIDIA Riva server address.
model_function_map
Mapping[str, str]
Mapping containing function_id and model_name for the ASR model.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
params
InputParams
default:"None"
Configuration parameters. See NvidiaSegmentedSTTService InputParams below.
use_ssl
bool
default:"True"
Whether to use SSL for the gRPC connection.

NvidiaSegmentedSTTService InputParams

ParameterTypeDefaultDescription
languageLanguageLanguage.EN_USTarget language for transcription.
profanity_filterboolFalseWhether to filter profanity from results.
automatic_punctuationboolTrueWhether to add automatic punctuation.
verbatim_transcriptsboolFalseWhether to return verbatim transcripts.
boosted_lm_wordslist[str]NoneList of words to boost in the language model.
boosted_lm_scorefloat4.0Score boost for specified words.

Usage

Streaming with Parakeet

from pipecat.services.nvidia import NvidiaSTTService

stt = NvidiaSTTService(
    api_key=os.getenv("NVIDIA_API_KEY"),
)

Segmented with Canary

from pipecat.services.nvidia import NvidiaSegmentedSTTService
from pipecat.transcriptions.language import Language

stt = NvidiaSegmentedSTTService(
    api_key=os.getenv("NVIDIA_API_KEY"),
    params=NvidiaSegmentedSTTService.InputParams(
        language=Language.ES,
        automatic_punctuation=True,
        boosted_lm_words=["Pipecat", "NVIDIA"],
        boosted_lm_score=6.0,
    ),
)

Notes

  • Model cannot be changed after initialization: Use the model_function_map parameter in the constructor to specify the model and function ID.
  • Streaming vs segmented: NvidiaSTTService provides real-time interim and final results through continuous streaming. NvidiaSegmentedSTTService processes complete audio segments for higher accuracy.
  • Language support: Supports Arabic, English (US/GB), French, German, Hindi, Italian, Japanese, Korean, Portuguese (BR), Russian, and Spanish (ES/US).
  • Word boosting: Use boosted_lm_words and boosted_lm_score in the segmented service to improve recognition of domain-specific terms.