Overview

OLLamaLLMService provides access to locally-run Ollama models through an OpenAI-compatible interface. It inherits from BaseOpenAILLMService and allows you to run various open-source models locally while maintaining compatibility with OpenAI’s API format.

Installation

To use Ollama services, you need to install both Ollama and the Pipecat dependency:

  1. Install Ollama on your system from ollama.com/download
  2. Install Pipecat dependency:
pip install "pipecat-ai[ollama]"
  1. Pull a model (first time only):
ollama pull llama2
Ollama runs as a local service on port 11434. No API key required!

Frames

Input

  • OpenAILLMContextFrame - Conversation context and history
  • LLMMessagesFrame - Direct message list
  • VisionImageRawFrame - Images for vision models
  • LLMUpdateSettingsFrame - Runtime parameter updates

Output

  • LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries
  • LLMTextFrame - Streamed completion chunks
  • FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle
  • ErrorFrame - Connection or processing errors

Function Calling

Function Calling Guide

Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications.

Context Management

Context Management Guide

Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences.

Usage Example

from pipecat.services.ollama.llm import OLLamaLLMService
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema

# Configure local Ollama service
llm = OLLamaLLMService(
    model="llama3.1",  # Must be pulled first: ollama pull llama3.1
    base_url="http://localhost:11434/v1",  # Default Ollama endpoint
    params=OLLamaLLMService.InputParams(
        temperature=0.7,
        max_tokens=1000
    )
)

# Define function for local processing
weather_function = FunctionSchema(
    name="get_current_weather",
    description="Get current weather information",
    properties={
        "location": {
            "type": "string",
            "description": "City and state, e.g. San Francisco, CA"
        },
        "format": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"],
            "description": "Temperature unit to use"
        }
    },
    required=["location", "format"]
)

tools = ToolsSchema(standard_tools=[weather_function])

# Create context optimized for local model
context = OpenAILLMContext(
    messages=[
        {
            "role": "system",
            "content": """You are a helpful assistant running locally.
            Be concise and efficient in your responses while maintaining helpfulness."""
        }
    ],
    tools=tools
)

# Create context aggregators
context_aggregator = llm.create_context_aggregator(context)

# Register function handler - all processing stays local
async def fetch_weather(params):
    location = params.arguments["location"]
    # Local weather lookup or cached data
    await params.result_callback({"conditions": "sunny", "temperature": "22°C"})

llm.register_function("get_current_weather", fetch_weather)

# Use in pipeline - completely offline capable
pipeline = Pipeline([
    transport.input(),
    stt,  # Can use local STT too
    context_aggregator.user(),
    llm,  # All inference happens locally
    tts,  # Can use local TTS too
    transport.output(),
    context_aggregator.assistant()
])

Metrics

Inherits all OpenAI metrics capabilities for local monitoring:

  • Time to First Byte (TTFB) - Local inference latency
  • Processing Duration - Model execution time
  • Token Usage - Local token counting (if supported by model)

Enable with:

task = PipelineTask(
    pipeline,
    params=PipelineParams(
        enable_metrics=True,
        enable_usage_metrics=True
    )
)

Additional Notes

  • Run models locally: Ollama allows you to run various open-source models on your own hardware, providing flexibility and control.
  • OpenAI Compatibility: Full compatibility with OpenAI API features and parameters
  • Privacy centric: All processing happens locally, ensuring data privacy and security.

Overview

OLLamaLLMService provides access to locally-run Ollama models through an OpenAI-compatible interface. It inherits from BaseOpenAILLMService and allows you to run various open-source models locally while maintaining compatibility with OpenAI’s API format.

Installation

To use Ollama services, you need to install both Ollama and the Pipecat dependency:

  1. Install Ollama on your system from ollama.com/download
  2. Install Pipecat dependency:
pip install "pipecat-ai[ollama]"
  1. Pull a model (first time only):
ollama pull llama2
Ollama runs as a local service on port 11434. No API key required!

Frames

Input

  • OpenAILLMContextFrame - Conversation context and history
  • LLMMessagesFrame - Direct message list
  • VisionImageRawFrame - Images for vision models
  • LLMUpdateSettingsFrame - Runtime parameter updates

Output

  • LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries
  • LLMTextFrame - Streamed completion chunks
  • FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle
  • ErrorFrame - Connection or processing errors

Function Calling

Function Calling Guide

Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications.

Context Management

Context Management Guide

Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences.

Usage Example

from pipecat.services.ollama.llm import OLLamaLLMService
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema

# Configure local Ollama service
llm = OLLamaLLMService(
    model="llama3.1",  # Must be pulled first: ollama pull llama3.1
    base_url="http://localhost:11434/v1",  # Default Ollama endpoint
    params=OLLamaLLMService.InputParams(
        temperature=0.7,
        max_tokens=1000
    )
)

# Define function for local processing
weather_function = FunctionSchema(
    name="get_current_weather",
    description="Get current weather information",
    properties={
        "location": {
            "type": "string",
            "description": "City and state, e.g. San Francisco, CA"
        },
        "format": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"],
            "description": "Temperature unit to use"
        }
    },
    required=["location", "format"]
)

tools = ToolsSchema(standard_tools=[weather_function])

# Create context optimized for local model
context = OpenAILLMContext(
    messages=[
        {
            "role": "system",
            "content": """You are a helpful assistant running locally.
            Be concise and efficient in your responses while maintaining helpfulness."""
        }
    ],
    tools=tools
)

# Create context aggregators
context_aggregator = llm.create_context_aggregator(context)

# Register function handler - all processing stays local
async def fetch_weather(params):
    location = params.arguments["location"]
    # Local weather lookup or cached data
    await params.result_callback({"conditions": "sunny", "temperature": "22°C"})

llm.register_function("get_current_weather", fetch_weather)

# Use in pipeline - completely offline capable
pipeline = Pipeline([
    transport.input(),
    stt,  # Can use local STT too
    context_aggregator.user(),
    llm,  # All inference happens locally
    tts,  # Can use local TTS too
    transport.output(),
    context_aggregator.assistant()
])

Metrics

Inherits all OpenAI metrics capabilities for local monitoring:

  • Time to First Byte (TTFB) - Local inference latency
  • Processing Duration - Model execution time
  • Token Usage - Local token counting (if supported by model)

Enable with:

task = PipelineTask(
    pipeline,
    params=PipelineParams(
        enable_metrics=True,
        enable_usage_metrics=True
    )
)

Additional Notes

  • Run models locally: Ollama allows you to run various open-source models on your own hardware, providing flexibility and control.
  • OpenAI Compatibility: Full compatibility with OpenAI API features and parameters
  • Privacy centric: All processing happens locally, ensuring data privacy and security.