Ollama
LLM service implementation using Ollama with OpenAI-compatible interface
Overview
OLLamaLLMService
provides access to locally-run Ollama models through an OpenAI-compatible interface. It inherits from BaseOpenAILLMService
and allows you to run various open-source models locally while maintaining compatibility with OpenAI’s API format.
Installation
To use OLLamaLLMService
, you need:
- Install Ollama on your system (Ollama installation guide)
- Install Pipecat dependencies:
Configuration
Constructor Parameters
Ollama model identifier
Local Ollama API endpoint
Input Parameters
Inherits all input parameters from BaseOpenAILLMService:
Additional parameters to pass to the model
Reduces likelihood of repeating tokens based on their frequency. Range: [-2.0, 2.0]
Maximum number of tokens in the completion. Must be greater than or equal to 1
Maximum number of tokens to generate. Must be greater than or equal to 1
Reduces likelihood of repeating any tokens that have appeared. Range: [-2.0, 2.0]
Random seed for deterministic generation. Must be greater than or equal to 0
Controls randomness in the output. Range: [0.0, 2.0]
Controls diversity via nucleus sampling. Range: [0.0, 1.0]
Input Frames
Contains OpenAI-specific conversation context
Contains conversation messages
Contains image for vision model processing
Updates model settings
Output Frames
Contains generated text chunks
Indicates start of function call
Contains function call results
Context Management
The Ollama service uses specialized context management to handle conversations and message formatting. It relies on the OpenAI base class for context management, which includes managing the conversation history, system prompts, tool calls, and converting between OpenAI and Ollama message formats.
OpenAILLMContext
The base context manager for OpenAI conversations:
Context Aggregators
Context aggregators handle message format conversion and management. The service provides a method to create paired aggregators:
Usage Example
The context management system ensures proper message formatting and history tracking throughout the conversation.
Methods
See the LLM base class methods for additional functionality.
Usage Example
Available Models
Ollama supports various open-source models. Here are some popular options:
Model Name | Description |
---|---|
llama2 | Meta’s Llama 2 base model |
codellama | Specialized for code generation |
mistral | Mistral AI’s base model |
mixtral | Mistral’s mixture of experts model |
neural-chat | Intel’s neural chat model |
phi | Microsoft’s phi model series |
vicuna | Berkeley’s vicuna model |
See Ollama’s documentation for a full list of available models.
To use a specific model:
Frame Flow
Inherits the BaseOpenAI LLM Service frame flow:
Metrics Support
The service collects standard metrics:
- Token usage (prompt and completion)
- Processing duration
- Time to First Byte (TTFB)
- Function call metrics
Key Features
-
Local Execution
- No internet connection required after model download
- Complete data privacy
- Lower latency for many use cases
-
Model Management
- Easy model switching
- Local model customization
- Version control through Ollama
-
Resource Control
- CPU/GPU utilization management
- Memory usage control
- Concurrent request handling
Common Use Cases
-
Development and Testing
- Rapid prototyping
- Offline development
- Cost-effective testing
-
Privacy-Sensitive Applications
- Healthcare applications
- Financial services
- Personal data processing
-
Edge Computing
- IoT applications
- Embedded systems
- Low-latency requirements
Notes
- Runs models locally through Ollama
- OpenAI-compatible interface
- Supports streaming responses
- Handles function calling
- Manages conversation context
- Includes token usage tracking
- Thread-safe processing