> ## Documentation Index > Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt > Use this file to discover all available pages before exploring further. # NVIDIA Nemotron Speech > Text-to-speech service implementation using NVIDIA Nemotron Speech ## Overview NVIDIA Nemotron Speech provides three TTS service implementations: * **`NvidiaTTSService`** -- High-quality TTS using both locally deployed and cloud-based NVIDIA TTS models. Supports multilingual synthesis, configurable quality settings, per-sentence and stitched synthesis modes, and zero-shot voice cloning. * **`NvidiaSageMakerHTTPTTSService`** -- Single HTTP invocation to an AWS SageMaker endpoint, streaming raw PCM audio back for each text segment. * **`NvidiaSageMakerTTSService`** -- Persistent HTTP/2 bidi-stream to an AWS SageMaker endpoint with full interruption support via `InterruptibleTTSService`. Pipecat's API methods for NVIDIA Nemotron Speech TTS integration Complete example with Nemotron Speech NIM Official NVIDIA TTS NIM documentation Access API keys and Nemotron Speech services ## Installation To use NVIDIA Nemotron Speech services, install the required dependencies: ```bash theme={null} uv add "pipecat-ai[nvidia]" ``` ## Prerequisites ### NVIDIA Nemotron Speech Setup Before using Nemotron Speech TTS services, you need: 1. **NVIDIA Developer Account**: Sign up at [NVIDIA Developer Portal](https://developer.nvidia.com/) 2. **API Key**: Generate an NVIDIA API key for Nemotron Speech services (required for cloud endpoint) 3. **Nemotron Speech Access**: Ensure access to NVIDIA Nemotron Speech TTS services For local deployments, see the [NVIDIA TTS NIM documentation](https://docs.nvidia.com/nim/speech/latest/tts/). ### Required Environment Variables * `NVIDIA_API_KEY`: Your NVIDIA API key for authentication (required for cloud endpoint, not needed for local deployments) ## Configuration ### NvidiaTTSService NVIDIA API key for authentication. Required when using the cloud endpoint. Not needed for local deployments. gRPC server endpoint. Defaults to NVIDIA's cloud endpoint. For local deployments, pass the local address (e.g. `localhost:50051`). Voice model identifier. *Deprecated in v0.0.105. Use `settings=NvidiaTTSService.Settings(...)` instead.* Audio sample rate in Hz. When `None`, uses the pipeline's configured sample rate. Dictionary containing `function_id` and `model_name` for the TTS model. Whether to use SSL for the gRPC connection. Defaults to True for the NVIDIA cloud endpoint. Set to False for local deployments. Custom pronunciation dictionary mapping words (graphemes) to IPA phonetic representations (phonemes), e.g. `{"NVIDIA": "ɛn.vɪ.diː.ʌ"}`. See [NVIDIA TTS NIM phoneme support](https://docs.nvidia.com/nim/speech/latest/tts/phoneme-support.html) for the list of supported IPA phonemes. Optional audio prompt file for Magpie zero-shot voice cloning. NVIDIA recommends a 16-bit mono WAV prompt, sample rate 22.05 kHz or higher, and duration 3 to 10 seconds. Access to NVIDIA's hosted zero-shot models requires approval through [NVIDIA Riva TTS Zero-Shot Models](https://developer.nvidia.com/riva-tts-zeroshot-models). Audio encoding for `zero_shot_audio_prompt_file`. Use this when the server expects a specific prompt encoding for Magpie zero-shot voice cloning. Output audio encoding format. Defaults to `AudioEncoding.LINEAR_PCM`. Runtime-configurable synthesis settings. See [Settings](#settings) below. *Deprecated in v0.0.105. Use `settings=NvidiaTTSService.Settings(...)` instead.* Runtime-configurable settings. See [Settings](#settings) below. ### Settings Runtime-configurable settings passed via the `settings` constructor argument using `NvidiaTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details. | Parameter | Type | Default | Description | | ---------------- | ------------------------ | ----------- | -------------------------------------------------------------------------------------------- | | `model` | `str` | `None` | Model identifier. *(Inherited.)* | | `voice` | `str` | `None` | Voice identifier. *(Inherited.)* | | `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* | | `quality` | `int` | `NOT_GIVEN` | Audio quality setting (0-100). For Magpie zero-shot, NVIDIA expects values in range 1 to 40. | | `synthesis_mode` | `NvidiaTTSSynthesisMode` | `NOT_GIVEN` | Whether to synthesize one sentence per request or stitch multiple sentences in one stream. | ## Usage ### Basic Setup ```python theme={null} from pipecat.services.nvidia import NvidiaTTSService tts = NvidiaTTSService( api_key=os.getenv("NVIDIA_API_KEY"), ) ``` ### With Custom Voice and Quality ```python theme={null} from pipecat.services.nvidia import NvidiaTTSService from pipecat.transcriptions.language import Language tts = NvidiaTTSService( api_key=os.getenv("NVIDIA_API_KEY"), model_function_map={ "function_id": "877104f7-e885-42b9-8de8-f6e4c6303969", "model_name": "magpie-tts-multilingual", }, settings=NvidiaTTSService.Settings( voice="Magpie-Multilingual.EN-US.Aria", language=Language.EN_US, quality=40, ), ) ``` The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use `Settings` / `settings=` instead. See the [Service Settings guide](/pipecat/fundamentals/service-settings) for migration details. ## Notes * **gRPC-based**: NVIDIA Nemotron Speech uses gRPC (not HTTP or WebSocket) for communication with the TTS service. * **Synthesis modes**: The service supports two synthesis modes via the `synthesis_mode` setting: * `PER_SENTENCE` (default): Opens a separate `SynthesizeOnline` call for each sentence. Compatible with all NVIDIA TTS NIMs, including Chatterbox, Magpie multilingual, and Magpie zero-shot. * `STITCHED`: Reuses one `SynthesizeOnline` stream across multiple sentences within the same LLM response for improved multi-sentence synthesis quality. Only use with models that support cross-sentence stitching, such as Magpie multilingual and Magpie zero-shot v1.7.0 or later. * **Zero-shot voice cloning**: Magpie zero-shot models support voice cloning via the `zero_shot_audio_prompt_file` parameter. NVIDIA recommends a 16-bit mono WAV prompt (22.05 kHz or higher, 3-10 seconds duration). Access to hosted zero-shot models requires approval. * **Runtime settings updates**: Voice, language, quality, and synthesis mode can be updated mid-conversation with `TTSUpdateSettingsFrame`. New values take effect on the next synthesis turn, not for the current turn's in-flight requests. * **Model cannot be changed after initialization**: The model and function ID must be set during construction via `model_function_map`. Calling `set_model()` after initialization will log a warning and have no effect. * **SSL enabled by default**: The service connects to NVIDIA's cloud endpoint with SSL. Set `use_ssl=False` only for local or custom Nemotron Speech deployments. * **Metrics generation**: This service supports metric generation via `can_generate_metrics()`. Metrics are automatically stopped when an audio context is interrupted. ## NvidiaSageMakerHTTPTTSService NVIDIA Magpie TTS service that calls a SageMaker HTTP endpoint for each text segment. Sends JSON to the endpoint's `/invocations` path and streams raw PCM audio back. ### Configuration Name of the deployed SageMaker endpoint. AWS region where the endpoint is deployed. Audio sample rate in Hz. When `None`, uses the pipeline's configured sample rate. Runtime-configurable settings. See [Settings](#settings-2) below. ### Settings Runtime-configurable settings passed via the `settings` constructor argument using `NvidiaSageMakerHTTPTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details. | Parameter | Type | Default | Description | | ---------- | ----------------- | -------------------------------- | ----------------------------------- | | `model` | `str` | `magpie` | Model identifier. *(Inherited.)* | | `voice` | `str` | `Magpie-Multilingual.EN-US.Aria` | Voice identifier. *(Inherited.)* | | `language` | `Language \| str` | `en-US` | BCP-47 language code for synthesis. | ### Usage ```python theme={null} from pipecat.services.nvidia.sagemaker.tts import NvidiaSageMakerHTTPTTSService tts = NvidiaSageMakerHTTPTTSService( endpoint_name=os.getenv("SAGEMAKER_MAGPIE_ENDPOINT_NAME"), region=os.getenv("AWS_REGION", "us-west-2"), settings=NvidiaSageMakerHTTPTTSService.Settings( voice="Magpie-Multilingual.EN-US.Aria", language="en-US", ), ) ``` ### Notes * **AWS SageMaker deployment required**: This service requires a deployed SageMaker endpoint running NVIDIA Magpie TTS NIM. See the [deployment example](https://github.com/pipecat-ai/pipecat-examples/tree/main/deployment/aws-sagemaker-nvidia) for setup instructions. * **AWS credentials**: Requires `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables for SageMaker authentication. * **Environment variables**: `SAGEMAKER_MAGPIE_ENDPOINT_NAME` for the endpoint name. * **HTTP-based**: Each text segment triggers a new HTTP POST request to the SageMaker endpoint. * **Metrics support**: This service supports metrics generation (`can_generate_metrics()` returns `True`). ## NvidiaSageMakerTTSService NVIDIA Magpie TTS service using SageMaker bidirectional streaming. Maintains a persistent HTTP/2 bidi-stream connection for the lifetime of the pipeline with full interruption support. ### Configuration Name of the deployed SageMaker endpoint. AWS region where the endpoint is deployed. Audio sample rate in Hz. When `None`, uses the pipeline's configured sample rate. Runtime-configurable settings. See [Settings](#settings-3) below. ### Settings Runtime-configurable settings passed via the `settings` constructor argument using `NvidiaSageMakerTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details. | Parameter | Type | Default | Description | | ---------- | ----------------- | -------------------------------- | ----------------------------------- | | `model` | `str` | `magpie` | Model identifier. *(Inherited.)* | | `voice` | `str` | `Magpie-Multilingual.EN-US.Aria` | Voice identifier. *(Inherited.)* | | `language` | `Language \| str` | `en-US` | BCP-47 language code for synthesis. | ### Usage ```python theme={null} from pipecat.services.nvidia.sagemaker.tts import NvidiaSageMakerTTSService tts = NvidiaSageMakerTTSService( endpoint_name=os.getenv("SAGEMAKER_MAGPIE_ENDPOINT_NAME"), region=os.getenv("AWS_REGION", "us-west-2"), settings=NvidiaSageMakerTTSService.Settings( voice="Magpie-Multilingual.EN-US.Aria", language="en-US", ), ) ``` ### Notes * **AWS SageMaker deployment required**: This service requires a deployed SageMaker endpoint running NVIDIA Magpie TTS NIM. See the [deployment example](https://github.com/pipecat-ai/pipecat-examples/tree/main/deployment/aws-sagemaker-nvidia) for setup instructions. * **AWS credentials**: Requires `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables for SageMaker authentication. * **Environment variables**: `SAGEMAKER_MAGPIE_ENDPOINT_NAME` for the endpoint name. * **Persistent connection**: Maintains a single HTTP/2 bidi-stream session for the pipeline's lifetime, reconnecting automatically on error. * **Interruption support**: Extends `InterruptibleTTSService` for proper handling of user interruptions. * **Metrics support**: This service supports metrics generation (`can_generate_metrics()` returns `True`).