Overview

ParakeetSTTService provides real-time speech-to-text capabilities using NVIDIA’s Riva Parakeet model. It supports interim results and configurable recognition parameters for enhanced accuracy.

Installation

To use ParakeetSTTService, install the required dependencies:

pip install pipecat-ai[riva]

You’ll also need to set up your NVIDIA API key as an environment variable: NVIDIA_API_KEY.

You can obtain an NVIDIA API key by signing up through NVIDIA’s developer portal.

Configuration

Constructor Parameters

api_key
str
required

Your NVIDIA API key

server
str
default:"grpc.nvcf.nvidia.com:443"

NVIDIA Riva server address

function_id
str
default:"1598d209-5e27-4d3c-8079-4751568b1081"

NVIDIA function identifier for the STT service

sample_rate
int
default:"None"

Audio sample rate in Hz

params
InputParams
default:"InputParams()"

Additional configuration parameters

InputParams

language
Language
default:"Language.EN_US"

The language for speech recognition

Input

The service processes audio frames containing:

  • Raw PCM audio data
  • 16-bit depth
  • Single channel (mono)

Output Frames

TranscriptionFrame

Generated for final transcriptions, containing:

text
string

Transcribed text

user_id
string

User identifier

timestamp
string

ISO 8601 formatted timestamp

language
Language

Language used for transcription

InterimTranscriptionFrame

Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results.

Methods

See the STT base class methods for additional functionality.

Usage Example

from pipecat.services.riva import ParakeetSTTService
from pipecat.transcriptions.language import Language

# Configure service
stt = ParakeetSTTService(
    api_key="your-nvidia-api-key",
    params=ParakeetSTTService.InputParams(
        language=Language.EN_US
    )
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    llm,
    ...
])

Language Support

Parakeet STT primarily supports English with various regional accents:

Language CodeDescriptionService Codes
Language.EN_USEnglish (US)en-US

Frame Flow

Advanced Configuration

The service supports several advanced configuration options that can be adjusted:

_profanity_filter
bool
default:"False"

Filter profanity from transcription

_automatic_punctuation
bool
default:"False"

Automatically add punctuation

_no_verbatim_transcripts
bool
default:"False"

Whether to disable verbatim transcripts

_boosted_lm_words
list
default:"None"

List of words to boost in the language model

_boosted_lm_score
float
default:"4.0"

Score applied to boosted words

Example with Advanced Configuration

# Configure service with advanced parameters
stt = ParakeetSTTService(
    api_key="your-nvidia-api-key",
    params=ParakeetSTTService.InputParams(
        language=Language.EN_US
    )
)

# Configure advanced options
stt._profanity_filter = True
stt._automatic_punctuation = True
stt._boosted_lm_words = ["PipeCat", "AI", "speech"]

Notes

  • Uses NVIDIA’s Riva AI Services platform
  • Handles streaming audio input
  • Provides real-time transcription results
  • Manages connection lifecycle
  • Uses asyncio for asynchronous processing
  • Automatically cleans up resources on stop/cancel