Overview

FastPitchTTSService converts text to speech using NVIDIA’s Riva FastPitch TTS model. It provides high-quality text-to-speech synthesis with configurable voice options.

Installation

To use FastPitchTTSService, install the required dependencies:

pip install pipecat-ai[riva]

You’ll also need to set up your NVIDIA API key as an environment variable: NVIDIA_API_KEY

Configuration

Constructor Parameters

api_key
str
required

Your NVIDIA API key

server
str
default:"grpc.nvcf.nvidia.com:443"

NVIDIA Riva server address

voice_id
str
default:"English-US.Female-1"

Voice identifier to use for synthesis

sample_rate
int
default:"None"

Output audio sample rate in Hz

function_id
str
default:"0149dedb-2be8-4195-b9a0-e57e0e14f972"

NVIDIA function identifier for the TTS service

params
InputParams
default:"InputParams()"

Additional configuration parameters (language and quality)

InputParams

language
Language
default:"Language.EN_US"

The language for TTS generation

quality
int
default:"20"

Quality level for the generated audio

Input

The service accepts text input through its TTS pipeline.

Output Frames

TTSStartedFrame

Signals the start of audio generation.

TTSAudioRawFrame

Contains generated audio data:

audio
bytes

Raw audio data chunk

sample_rate
int

Audio sample rate

num_channels
int

Number of audio channels (1 for mono)

TTSStoppedFrame

Signals the completion of audio generation.

Methods

See the TTS base class methods for additional functionality.

Language Support

FastPitch TTS primarily supports English with various regional accents:

Language CodeDescriptionService Codes
Language.EN_USEnglish (US)en-US

Usage Example

from pipecat.services.riva import FastPitchTTSService
from pipecat.transcriptions.language import Language

# Configure service
tts = FastPitchTTSService(
    api_key="your-nvidia-api-key",
    voice_id="English-US.Female-1",
    params=FastPitchTTSService.InputParams(
        language=Language.EN_US,
        quality=20
    )
)

# Use in pipeline
pipeline = Pipeline([
    ...,
    llm,
    tts,
    transport.output(),
])

Frame Flow

Metrics Support

The service supports metrics collection:

  • Time to First Byte (TTFB)
  • TTS usage metrics
  • Processing duration

Audio Processing

  • Processes audio through the Riva API
  • Generates mono audio output
  • Handles asynchronous audio streaming
  • Configurable sampling rate

Notes

  • Uses NVIDIA’s Riva AI Services platform
  • Streams audio in chunks
  • Requires valid NVIDIA API key
  • Thread-safe processing with asyncio