Overview

NVIDIA Riva provides high-quality text-to-speech synthesis through cloud-based AI models accessible via gRPC API. The service offers multilingual support, configurable quality settings, and streaming audio generation optimized for real-time applications.

Installation

To use NVIDIA Riva services, install the required dependencies:
pip install "pipecat-ai[riva]"
You’ll also need to set up your NVIDIA API key as an environment variable: NVIDIA_API_KEY.
Get your API key from the NVIDIA Developer Portal and access to Riva services.

Frames

Input

  • TextFrame - Text content to synthesize into speech
  • TTSSpeakFrame - Text that should be spoken immediately
  • TTSUpdateSettingsFrame - Runtime configuration updates
  • LLMFullResponseStartFrame / LLMFullResponseEndFrame - LLM response boundaries

Output

  • TTSStartedFrame - Signals start of synthesis
  • TTSAudioRawFrame - Generated audio data chunks (streaming)
  • TTSStoppedFrame - Signals completion of synthesis
  • ErrorFrame - API or processing errors

Available Models

ModelDescriptionBest For
magpie-tts-multilingualMultilingual model with natural voicesConversational AI, multiple languages
fastpitch-hifigan-ttsHigh-quality English synthesisEnglish-only applications
The magpie-tts-multilingual model is the default and recommended for most use cases due to its multilingual capabilities and natural voice quality.

Language Support

The magpie-tts-multilingual model supports:
Language CodeDescriptionService Code
Language.EN_USEnglish (US)en-US
Language.ES_USSpanish (US)es-US
Language.FR_FRFrench (France)fr-FR
Language.DE_DEGerman (Germany)de-DE
Language.IT_ITItalian (Italy)it-IT
Language.ZH_CNChinese (China)zh-CN

Usage Example

Basic Configuration

Initialize the Riva TTS service with your API key and desired voice:
from pipecat.services.riva.tts import RivaTTSService
from pipecat.transcriptions.language import Language
import os

# Configure with default multilingual model
tts = RivaTTSService(
    api_key=os.getenv("NVIDIA_API_KEY"),
    voice_id="Magpie-Multilingual.EN-US.Ray",
    params=RivaTTSService.InputParams(
        language=Language.EN_US,
        quality=20
    )
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

Dynamic Configuration

Make settings updates by pushing a TTSUpdateSettingsFrame for the RivaTTSService:
from pipecat.frames.frames import TTSUpdateSettingsFrame

await task.queue_frame(TTSUpdateSettingsFrame(
    voice_id="Magpie-Multilingual.ES-US.Luna",
    params=RivaTTSService.InputParams(
        language=Language.ES_US,
    )
 ))

Metrics

The service provides comprehensive metrics:
  • Time to First Byte (TTFB) - Latency from text input to first audio
  • Processing Duration - Total synthesis time
  • Character Usage - Text processed for billing
Learn how to enable Metrics in your Pipeline.

Additional Notes

  • Model Set at Initialization: Models cannot be changed after initialization - configure model_function_map during construction
  • Deprecated Classes: FastPitchTTSService is deprecated - use RivaTTSService instead
  • Quality vs Speed: Higher quality settings increase synthesis time but improve audio quality