NVIDIA Riva

Overview

NVIDIA Riva provides two STT services:

RivaSTTService for real-time streaming transcription using Parakeet models
RivaSegmentedSTTService for segmented transcription using Canary models with advanced language support

API Reference

Complete API documentation and method details

NVIDIA Riva Docs

Official NVIDIA Riva ASR documentation

Example Code

Working example with NVIDIA services integration

Installation

To use NVIDIA Riva services, install the required dependency:

pip install "pipecat-ai[riva]"

You’ll also need to set up your NVIDIA API key as an environment variable: NVIDIA_API_KEY.

Get your API key from NVIDIA’s developer portal.

Frames

Input

InputAudioRawFrame - Raw PCM audio data (16-bit, mono)
STTUpdateSettingsFrame - Runtime transcription configuration updates
STTMuteFrame - Mute audio input for transcription

Output

InterimTranscriptionFrame - Real-time transcription updates (streaming only)
TranscriptionFrame - Final transcription results
ErrorFrame - Connection or processing errors

Service Comparison

Feature	RivaSTTService	RivaSegmentedSTTService
Processing	Real-time streaming	Segmented (VAD-based)
Model	Parakeet CTC 1.1B	Canary 1B
Latency	Ultra-low	Higher (batch processing)
Languages	English-focused	Multi-language
Interim Results	✅ Yes	❌ No
Best For	Real-time conversation	Multi-language accuracy

Models

Model	Service Class	Description	Languages
`parakeet-ctc-1.1b-asr`	RivaSTTService	Streaming ASR optimized for low latency	English (various accents)
`canary-1b-asr`	RivaSegmentedSTTService	Multilingual ASR with high accuracy	15+ languages

See NVIDIA’s model cards for detailed performance metrics.

Language Support

RivaSTTService (Parakeet)

Primarily supports English with various regional accents:

Language.EN_US - English (US) - en-US

RivaSegmentedSTTService (Canary)

Supports multiple languages:

Language Code	Description	Service Codes
`Language.EN_US`	English (US)	`en-US`
`Language.EN_GB`	English (UK)	`en-GB`
`Language.ES`	Spanish	`es-ES`
`Language.ES_US`	Spanish (US)	`es-US`
`Language.FR`	French	`fr-FR`
`Language.DE`	German	`de-DE`
`Language.IT`	Italian	`it-IT`
`Language.PT_BR`	Portuguese (Brazil)	`pt-BR`
`Language.JA`	Japanese	`ja-JP`
`Language.KO`	Korean	`ko-KR`
`Language.RU`	Russian	`ru-RU`
`Language.HI`	Hindi	`hi-IN`
`Language.AR`	Arabic	`ar-AR`

Usage Example

Real-time Streaming (RivaSTTService)

Initialize the RivaSTTService and use it in a pipeline:

from pipecat.services.riva.stt import RivaSTTService
from pipecat.transcriptions.language import Language

# Basic streaming configuration
stt = RivaSTTService(
    api_key=os.getenv("NVIDIA_API_KEY"),
    params=RivaSTTService.InputParams(
        language=Language.EN_US
    )
)

# Use in pipeline for real-time conversation
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

Segmented Multi-language (RivaSegmentedSTTService)

Initialize the RivaSegmentedSTTService for segmented transcription:

from pipecat.services.riva.stt import RivaSegmentedSTTService
from pipecat.audio.vad.silero import SileroVADAnalyzer

# Multi-language segmented transcription
stt = RivaSegmentedSTTService(
    api_key=os.getenv("NVIDIA_API_KEY"),
    params=RivaSegmentedSTTService.InputParams(
        language=Language.ES,  # Spanish
        profanity_filter=False,
        automatic_punctuation=True,
        boosted_lm_words=["inteligencia", "artificial"],
        boosted_lm_score=5.0
    )
)



# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,  # Processes audio segments
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

Dynamic Configuration

Make settings updates by pushing an STTUpdateSettingsFrame for either service:

from pipecat.frames.frames import STTUpdateSettingsFrame

await task.queue_frame(STTUpdateSettingsFrame(
    language=Language.FR,  # Change to French
  )
)

Advanced Configuration

Both services support advanced ASR parameters:

Word Boosting

boosted_lm_words: List of domain-specific terms to emphasize
boosted_lm_score: Boost intensity (default: 4.0, recommended: 4.0-8.0)

Audio Processing

profanity_filter: Filter inappropriate content
automatic_punctuation: Add punctuation automatically
verbatim_transcripts: Control transcript formatting

Voice Activity Detection (Streaming only)

start_history: History frames for speech start detection
start_threshold: Confidence threshold for speech start
stop_threshold: Confidence threshold for speech end

Metrics

Time to First Byte (TTFB) - Latency from audio segment to transcription
Processing Duration - Time spent processing each segment

Learn how to enable Metrics in your Pipeline.

Additional Notes

Authentication: Uses NVIDIA Cloud Functions with Bearer token authentication
Real-time vs Batch: Choose streaming for conversation, segmented for accuracy and multi-language
VAD Requirement: Segmented service requires Voice Activity Detection in the pipeline
Custom Endpoints: Supports custom Riva server endpoints for on-premise deployments

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

API Reference

NVIDIA Riva Docs

Example Code

Installation

Frames

Input

Output

Service Comparison

Models

Language Support

RivaSTTService (Parakeet)

RivaSegmentedSTTService (Canary)

Usage Example

Real-time Streaming (RivaSTTService)

Segmented Multi-language (RivaSegmentedSTTService)

Dynamic Configuration

Advanced Configuration

Word Boosting

Audio Processing

Voice Activity Detection (Streaming only)

Metrics

Additional Notes

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

API Reference

NVIDIA Riva Docs

Example Code

​Installation

​Frames

​Input

​Output

​Service Comparison

​Models

​Language Support

​RivaSTTService (Parakeet)

​RivaSegmentedSTTService (Canary)

​Usage Example

​Real-time Streaming (RivaSTTService)

​Segmented Multi-language (RivaSegmentedSTTService)

​Dynamic Configuration

​Advanced Configuration

​Word Boosting

​Audio Processing

​Voice Activity Detection (Streaming only)

​Metrics

​Additional Notes

Overview

Installation

Frames

Input

Output

Service Comparison

Models

Language Support

RivaSTTService (Parakeet)

RivaSegmentedSTTService (Canary)

Usage Example

Real-time Streaming (RivaSTTService)

Segmented Multi-language (RivaSegmentedSTTService)

Dynamic Configuration

Advanced Configuration

Word Boosting

Audio Processing

Voice Activity Detection (Streaming only)

Metrics

Additional Notes