> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# NVIDIA NIM

> LLM service implementation using NVIDIA's NIM (NVIDIA Inference Microservice) API with OpenAI-compatible interface

## Overview

`NvidiaLLMService` provides access to NVIDIA's NIM language models through an OpenAI-compatible interface. It inherits from `OpenAILLMService` and supports streaming responses, function calling, and context management, with special handling for NVIDIA's incremental token reporting and enterprise deployment.

<CardGroup cols={2}>
  <Card title="NVIDIA NIM LLM API Reference" icon="code" href="https://reference-server.pipecat.ai/en/latest/api/pipecat.services.nvidia.llm.html">
    Pipecat's API methods for NVIDIA NIM integration
  </Card>

  <Card title="Example Implementation" icon="play" href="https://github.com/pipecat-ai/pipecat/blob/main/examples/function-calling/function-calling-nvidia.py">
    Complete example with function calling
  </Card>

  <Card title="NVIDIA NIM Documentation" icon="book" href="https://docs.nvidia.com/nim/">
    Official NVIDIA NIM documentation and setup
  </Card>

  <Card title="NVIDIA Developer Portal" icon="microphone" href="https://developer.nvidia.com/">
    Access NIM services and manage API keys
  </Card>
</CardGroup>

## Installation

To use NVIDIA NIM services, install the required dependencies:

```bash theme={null}
uv add "pipecat-ai[nvidia]"
```

## Prerequisites

### NVIDIA NIM Setup

Before using NVIDIA NIM LLM services, you need:

1. **NVIDIA Developer Account** (cloud endpoint only): Sign up at [NVIDIA Developer Portal](https://developer.nvidia.com/)
2. **API Key** (cloud endpoint only): Generate an NVIDIA API key for NIM cloud services
3. **Model Selection**: Choose from available NIM-hosted models
4. **Local NIM Setup** (optional): For local deployments, configure NIM on-premises and set the `base_url` to your local endpoint

### Environment Variables

* `NVIDIA_API_KEY`: Your NVIDIA API key for authentication (required for cloud endpoint, not needed for local NIM deployments)

## Configuration

<ParamField path="api_key" type="str | None" default="None">
  NVIDIA API key for authentication. Required when using the cloud endpoint (`https://integrate.api.nvidia.com/v1`). Not needed for local NIM deployments.
</ParamField>

<ParamField path="base_url" type="str" default="https://integrate.api.nvidia.com/v1">
  Base URL for NIM API endpoint. Defaults to NVIDIA's cloud endpoint. For local deployments, pass the local address (e.g., `http://localhost:8000/v1`).
</ParamField>

<ParamField path="model" type="str" default="None" deprecated>
  Model identifier to use.

  *Deprecated in v0.0.105. Use `settings=NvidiaLLMService.Settings(model=...)` instead.*
</ParamField>

<ParamField path="settings" type="NvidiaLLMService.Settings" default="None">
  Runtime-configurable settings. See [Settings](#settings) below.
</ParamField>

### Settings

Runtime-configurable settings passed via the `settings` constructor argument using `NvidiaLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.

This service uses the same settings as `OpenAILLMService`. See [OpenAI LLM Settings](/api-reference/server/services/llm/openai#settings) for the full parameter reference.

## Usage

### Basic Setup

```python theme={null}
import os
from pipecat.services.nvidia import NvidiaLLMService

llm = NvidiaLLMService(
    api_key=os.getenv("NVIDIA_API_KEY"),
    settings=NvidiaLLMService.Settings(
        model="nvidia/nemotron-3-nano-30b-a3b",
    ),
)
```

### With Custom Settings

```python theme={null}
from pipecat.services.nvidia import NvidiaLLMService

llm = NvidiaLLMService(
    api_key=os.getenv("NVIDIA_API_KEY"),
    settings=NvidiaLLMService.Settings(
        model="nvidia/nemotron-3-nano-30b-a3b",
        temperature=0.7,
        top_p=0.9,
        max_completion_tokens=1024,
    ),
)
```

## Notes

* **Token reporting**: NVIDIA NIM uses incremental token reporting. The service accumulates token usage metrics during processing and reports the final totals at the end of each request.
* **Cloud vs. local deployment**: NIM supports both cloud-hosted and on-premises deployments. For on-premises, override the `base_url` to point to your local NIM endpoint. API keys are only required for the cloud endpoint.
* **Reasoning content**: The service automatically detects and filters reasoning content from model responses, emitting it as `LLMThought*Frame` objects. This applies to:

  * Models with API-level reasoning separation (e.g., Nemotron Nano models) that include a `reasoning_content` field
  * Models that emit reasoning inline using `<think>...</think>` tags (e.g., DeepSeek-R1, some Nemotron models)

  Reasoning frames are accessible to observers and logging but are not sent to TTS, keeping the spoken output clean while preserving visibility into the model's thought process.

<Tip>
  The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
  `Settings` / `settings=` instead. See the [Service Settings
  guide](/pipecat/fundamentals/service-settings) for migration details.
</Tip>
