Speech-to-text service implementation using NVIDIA Riva
RivaSTTService
for real-time streaming transcription using Parakeet modelsRivaSegmentedSTTService
for segmented transcription using Canary models with advanced language supportNVIDIA_API_KEY
.
InputAudioRawFrame
- Raw PCM audio data (16-bit, mono)STTUpdateSettingsFrame
- Runtime transcription configuration updatesSTTMuteFrame
- Mute audio input for transcriptionInterimTranscriptionFrame
- Real-time transcription updates (streaming only)TranscriptionFrame
- Final transcription resultsErrorFrame
- Connection or processing errorsFeature | RivaSTTService | RivaSegmentedSTTService |
---|---|---|
Processing | Real-time streaming | Segmented (VAD-based) |
Model | Parakeet CTC 1.1B | Canary 1B |
Latency | Ultra-low | Higher (batch processing) |
Languages | English-focused | Multi-language |
Interim Results | ✅ Yes | ❌ No |
Best For | Real-time conversation | Multi-language accuracy |
Model | Service Class | Description | Languages |
---|---|---|---|
parakeet-ctc-1.1b-asr | RivaSTTService | Streaming ASR optimized for low latency | English (various accents) |
canary-1b-asr | RivaSegmentedSTTService | Multilingual ASR with high accuracy | 15+ languages |
Language.EN_US
- English (US) - en-US
Language Code | Description | Service Codes |
---|---|---|
Language.EN_US | English (US) | en-US |
Language.EN_GB | English (UK) | en-GB |
Language.ES | Spanish | es-ES |
Language.ES_US | Spanish (US) | es-US |
Language.FR | French | fr-FR |
Language.DE | German | de-DE |
Language.IT | Italian | it-IT |
Language.PT_BR | Portuguese (Brazil) | pt-BR |
Language.JA | Japanese | ja-JP |
Language.KO | Korean | ko-KR |
Language.RU | Russian | ru-RU |
Language.HI | Hindi | hi-IN |
Language.AR | Arabic | ar-AR |
RivaSTTService
and use it in a pipeline:
RivaSegmentedSTTService
for segmented transcription:
STTUpdateSettingsFrame
for either service:
boosted_lm_words
: List of domain-specific terms to emphasizeboosted_lm_score
: Boost intensity (default: 4.0, recommended: 4.0-8.0)profanity_filter
: Filter inappropriate contentautomatic_punctuation
: Add punctuation automaticallyverbatim_transcripts
: Control transcript formattingstart_history
: History frames for speech start detectionstart_threshold
: Confidence threshold for speech startstop_threshold
: Confidence threshold for speech end