> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# NVIDIA Nemotron Speech

> Text-to-speech service implementation using NVIDIA Nemotron Speech

## Overview

`NvidiaTTSService` provides high-quality text-to-speech synthesis through NVIDIA Nemotron Speech's cloud-based AI models accessible via gRPC API. The service offers multilingual support, configurable quality settings, cross-sentence audio stitching, and streaming audio generation optimized for real-time applications.

<CardGroup cols={2}>
  <Card title="NVIDIA Nemotron Speech TTS API Reference" icon="code" href="https://reference-server.pipecat.ai/en/latest/api/pipecat.services.nvidia.tts.html">
    Pipecat's API methods for NVIDIA Nemotron Speech TTS integration
  </Card>

  <Card title="Example Implementation" icon="play" href="https://github.com/pipecat-ai/pipecat/blob/main/examples/voice/voice-nvidia.py">
    Complete example with Nemotron Speech NIM
  </Card>

  <Card title="NVIDIA TTS NIM Documentation" icon="book" href="https://docs.nvidia.com/nim/speech/latest/tts/">
    Official NVIDIA TTS NIM documentation
  </Card>

  <Card title="NVIDIA Developer Portal" icon="microphone" href="https://developer.nvidia.com/">
    Access API keys and Nemotron Speech services
  </Card>
</CardGroup>

## Installation

To use NVIDIA Nemotron Speech services, install the required dependencies:

```bash theme={null}
uv add "pipecat-ai[nvidia]"
```

## Prerequisites

### NVIDIA Nemotron Speech Setup

Before using Nemotron Speech TTS services, you need:

1. **NVIDIA Developer Account**: Sign up at [NVIDIA Developer Portal](https://developer.nvidia.com/)
2. **API Key**: Generate an NVIDIA API key for Nemotron Speech services (required for cloud endpoint)
3. **Nemotron Speech Access**: Ensure access to NVIDIA Nemotron Speech TTS services

For local deployments, see the [NVIDIA TTS NIM documentation](https://docs.nvidia.com/nim/speech/latest/tts/).

### Required Environment Variables

* `NVIDIA_API_KEY`: Your NVIDIA API key for authentication (required for cloud endpoint, not needed for local deployments)

## Configuration

### NvidiaTTSService

<ParamField path="api_key" type="str" default="None">
  NVIDIA API key for authentication. Required when using the cloud endpoint. Not
  needed for local deployments.
</ParamField>

<ParamField path="server" type="str" default="grpc.nvcf.nvidia.com:443">
  gRPC server endpoint. Defaults to NVIDIA's cloud endpoint. For local
  deployments, pass the local address (e.g. `localhost:50051`).
</ParamField>

<ParamField path="voice_id" type="str" default="Magpie-Multilingual.EN-US.Aria" deprecated>
  Voice model identifier.

  *Deprecated in v0.0.105. Use `settings=NvidiaTTSService.Settings(...)` instead.*
</ParamField>

<ParamField path="sample_rate" type="int" default="None">
  Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
  rate.
</ParamField>

<ParamField path="model_function_map" type="dict" default="{&#x22;function_id&#x22;: &#x22;877104f7-e885-42b9-8de8-f6e4c6303969&#x22;, &#x22;model_name&#x22;: &#x22;magpie-tts-multilingual&#x22;}">
  Dictionary containing `function_id` and `model_name` for the TTS model.
</ParamField>

<ParamField path="use_ssl" type="bool" default="True">
  Whether to use SSL for the gRPC connection. Defaults to True for the NVIDIA
  cloud endpoint. Set to False for local deployments.
</ParamField>

<ParamField path="custom_dictionary" type="dict" default="None">
  Custom pronunciation dictionary mapping words (graphemes) to IPA phonetic
  representations (phonemes), e.g. `{"NVIDIA": "ɛn.vɪ.diː.ʌ"}`. See [NVIDIA TTS
  NIM phoneme
  support](https://docs.nvidia.com/nim/speech/latest/tts/phoneme-support.html)
  for the list of supported IPA phonemes.
</ParamField>

<ParamField path="encoding" type="AudioEncoding" default="AudioEncoding.LINEAR_PCM">
  Output audio encoding format. Defaults to `AudioEncoding.LINEAR_PCM`.
</ParamField>

<ParamField path="params" type="InputParams" default="None" deprecated>
  Runtime-configurable synthesis settings. See [InputParams](#inputparams)
  below.

  *Deprecated in v0.0.105. Use `settings=NvidiaTTSService.Settings(...)` instead.*
</ParamField>

<ParamField path="settings" type="NvidiaTTSService.Settings" default="None">
  Runtime-configurable settings. See [Settings](#settings) below.
</ParamField>

### Settings

Runtime-configurable settings passed via the `settings` constructor argument using `NvidiaTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.

| Parameter  | Type              | Default     | Description                            |
| ---------- | ----------------- | ----------- | -------------------------------------- |
| `model`    | `str`             | `None`      | Model identifier. *(Inherited.)*       |
| `voice`    | `str`             | `None`      | Voice identifier. *(Inherited.)*       |
| `language` | `Language \| str` | `None`      | Language for synthesis. *(Inherited.)* |
| `quality`  | `int`             | `NOT_GIVEN` | Audio quality setting.                 |

## Usage

### Basic Setup

```python theme={null}
from pipecat.services.nvidia import NvidiaTTSService

tts = NvidiaTTSService(
    api_key=os.getenv("NVIDIA_API_KEY"),
)
```

### With Custom Voice and Quality

```python theme={null}
from pipecat.services.nvidia import NvidiaTTSService
from pipecat.transcriptions.language import Language

tts = NvidiaTTSService(
    api_key=os.getenv("NVIDIA_API_KEY"),
    model_function_map={
        "function_id": "877104f7-e885-42b9-8de8-f6e4c6303969",
        "model_name": "magpie-tts-multilingual",
    },
    settings=NvidiaTTSService.Settings(
        voice="Magpie-Multilingual.EN-US.Aria",
        language=Language.EN_US,
        quality=40,
    ),
)
```

<Tip>
  The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
  `Settings` / `settings=` instead. See the [Service Settings
  guide](/pipecat/fundamentals/service-settings) for migration details.
</Tip>

## Notes

* **gRPC-based**: NVIDIA Nemotron Speech uses gRPC (not HTTP or WebSocket) for communication with the TTS service.
* **Cross-sentence stitching**: Multiple sentences within an LLM turn are fed into a single `SynthesizeOnline` gRPC stream for seamless audio across sentence boundaries (requires Magpie TTS model v1.7.0+).
* **Runtime settings updates**: Voice, language, and quality can be updated mid-conversation with `TTSUpdateSettingsFrame`. New values take effect on the next synthesis turn, not for the current turn's in-flight requests.
* **Model cannot be changed after initialization**: The model and function ID must be set during construction via `model_function_map`. Calling `set_model()` after initialization will log a warning and have no effect.
* **SSL enabled by default**: The service connects to NVIDIA's cloud endpoint with SSL. Set `use_ssl=False` only for local or custom Nemotron Speech deployments.
* **Metrics generation**: This service supports metric generation via `can_generate_metrics()`. Metrics are automatically stopped when an audio context is interrupted.
