NVIDIA NIM
LLM service implementation using NVIDIA’s NIM (NVIDIA Inference Microservice) API with OpenAI-compatible interface
Overview
NimLLMService
provides access to NVIDIA’s NIM language models through an OpenAI-compatible interface. It inherits from OpenAILLMService
and supports streaming responses, function calling, and context management, with special handling for NVIDIA’s incremental token reporting.
Installation
To use NimLLMService
, install the required dependencies:
You’ll need to set up your NVIDIA NIM API key as an environment variable: NIM_API_KEY
Configuration
Constructor Parameters
Your NVIDIA NIM API key
Model identifier
NVIDIA NIM API endpoint
Input Parameters
Inherits OpenAI-compatible parameters:
Reduces likelihood of repeating tokens based on their frequency. Range: [-2.0, 2.0]
Maximum number of tokens to generate. Must be greater than or equal to 1
Reduces likelihood of repeating any tokens that have appeared. Range: [-2.0, 2.0]
Controls randomness in the output. Range: [0.0, 2.0]
Controls diversity via nucleus sampling. Range: [0.0, 1.0]
Usage Example
Methods
See the LLM base class methods for additional functionality.
Function Calling
This service supports function calling (also known as tool calling) which allows the LLM to request information from external services and APIs. For example, you can enable your bot to:
- Check current weather conditions
- Query databases
- Access external APIs
- Perform custom actions
Function Calling Guide
Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications.
Available Models
NVIDIA NIM provides access to various models:
Model Name | Description |
---|---|
nvidia/llama-3.1-nemotron-70b-instruct | Llama 3.1 70B Nemotron instruct |
nvidia/llama-3.1-nemotron-13b-instruct | Llama 3.1 13B Nemotron instruct |
nvidia/llama-3.1-nemotron-8b-instruct | Llama 3.1 8B Nemotron instruct |
See NVIDIA’s NIM console for a complete list of supported models.
Token Usage Handling
NimLLMService includes special handling for token usage metrics:
- Accumulates incremental token updates from NIM
- Records prompt tokens on first appearance
- Tracks completion tokens as they increase
- Reports final totals at the end of processing
This ensures compatibility with OpenAI’s token reporting format while maintaining accurate metrics.
Frame Flow
Inherits the OpenAI LLM Service frame flow:
Metrics Support
The service collects standard LLM metrics:
- Token usage (prompt and completion)
- Processing duration
- Time to First Byte (TTFB)
- Function call metrics
Notes
- OpenAI-compatible interface
- Supports streaming responses
- Handles function calling
- Manages conversation context
- Custom token usage tracking for NIM’s incremental reporting
- Thread-safe processing
- Automatic error handling