NVIDIA Riva

Overview

NVIDIA Riva provides high-quality text-to-speech synthesis through cloud-based AI models accessible via gRPC API. The service offers multilingual support, configurable quality settings, and streaming audio generation optimized for real-time applications.

API Reference

Complete API documentation and method details

NVIDIA Riva Docs

Official NVIDIA Riva TTS documentation

Example Code

Working example with Riva NIM

Installation

To use NVIDIA Riva services, install the required dependencies:

pip install "pipecat-ai[riva]"

You’ll also need to set up your NVIDIA API key as an environment variable: NVIDIA_API_KEY.

Get your API key from the NVIDIA Developer Portal and access to Riva services.

Frames

Input

TextFrame - Text content to synthesize into speech
TTSSpeakFrame - Text that should be spoken immediately
TTSUpdateSettingsFrame - Runtime configuration updates
LLMFullResponseStartFrame / LLMFullResponseEndFrame - LLM response boundaries

Output

TTSStartedFrame - Signals start of synthesis
TTSAudioRawFrame - Generated audio data chunks (streaming)
TTSStoppedFrame - Signals completion of synthesis
ErrorFrame - API or processing errors

Available Models

Model	Description	Best For
`magpie-tts-multilingual`	Multilingual model with natural voices	Conversational AI, multiple languages
`fastpitch-hifigan-tts`	High-quality English synthesis	English-only applications

The magpie-tts-multilingual model is the default and recommended for most use cases due to its multilingual capabilities and natural voice quality.

Language Support

The magpie-tts-multilingual model supports:

Language Code	Description	Service Code
`Language.EN_US`	English (US)	`en-US`
`Language.ES_US`	Spanish (US)	`es-US`
`Language.FR_FR`	French (France)	`fr-FR`
`Language.DE_DE`	German (Germany)	`de-DE`
`Language.IT_IT`	Italian (Italy)	`it-IT`
`Language.ZH_CN`	Chinese (China)	`zh-CN`

Usage Example

Basic Configuration

Initialize the Riva TTS service with your API key and desired voice:

from pipecat.services.riva.tts import RivaTTSService
from pipecat.transcriptions.language import Language
import os

# Configure with default multilingual model
tts = RivaTTSService(
    api_key=os.getenv("NVIDIA_API_KEY"),
    voice_id="Magpie-Multilingual.EN-US.Ray",
    params=RivaTTSService.InputParams(
        language=Language.EN_US,
        quality=20
    )
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

Dynamic Configuration

Make settings updates by pushing a TTSUpdateSettingsFrame for the RivaTTSService:

from pipecat.frames.frames import TTSUpdateSettingsFrame

await task.queue_frame(TTSUpdateSettingsFrame(
    voice_id="Magpie-Multilingual.ES-US.Luna",
    params=RivaTTSService.InputParams(
        language=Language.ES_US,
    )
 ))

Metrics

The service provides comprehensive metrics:

Time to First Byte (TTFB) - Latency from text input to first audio
Processing Duration - Total synthesis time
Character Usage - Text processed for billing

Learn how to enable Metrics in your Pipeline.

Additional Notes

Model Set at Initialization: Models cannot be changed after initialization - configure model_function_map during construction
Deprecated Classes: FastPitchTTSService is deprecated - use RivaTTSService instead
Quality vs Speed: Higher quality settings increase synthesis time but improve audio quality

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

API Reference

NVIDIA Riva Docs

Example Code

Installation

Frames

Input

Output

Available Models

Language Support

Usage Example

Basic Configuration

Dynamic Configuration

Metrics

Additional Notes

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

API Reference

NVIDIA Riva Docs

Example Code

​Installation

​Frames

​Input

​Output

​Available Models

​Language Support

​Usage Example

​Basic Configuration

​Dynamic Configuration

​Metrics

​Additional Notes

Overview

Installation

Frames

Input

Output

Available Models

Language Support

Usage Example

Basic Configuration

Dynamic Configuration

Metrics

Additional Notes