Skip to main content

Overview

MoondreamService provides local image analysis and question-answering capabilities using the Moondream model. It runs entirely on your local machine, supporting various hardware acceleration options including CUDA, Intel XPU, and Apple MPS for privacy-focused computer vision applications.

Installation

To use Moondream services, install the required dependencies:
pip install "pipecat-ai[moondream]"

Prerequisites

Local Model Setup

Before using Moondream vision services, you need:
  1. Model Download: First run will automatically download the Moondream model from Hugging Face
  2. Hardware Configuration: Set up CUDA, Intel XPU, or Apple MPS for optimal performance
  3. Storage Space: Ensure sufficient disk space for model files
  4. Memory Requirements: Adequate RAM/VRAM for model inference

Hardware Acceleration

The service automatically detects and uses the best available hardware:
  • Intel XPU: Requires intel_extension_for_pytorch
  • NVIDIA CUDA: For GPU acceleration
  • Apple Metal (MPS): For Apple Silicon optimization
  • CPU: Fallback option for any system

Configuration Options

  • Model Selection: Choose Moondream model version and revision
  • Hardware Override: Force CPU usage if needed
  • Local Processing: Complete privacy with no external API calls
No API keys required - Moondream runs entirely locally for complete privacy and control.

Configuration

model
str
default:"vikhyatk/moondream2"
Hugging Face model identifier for the Moondream model.
revision
str
default:"2025-01-09"
Specific model revision to use.
use_cpu
bool
default:"False"
Whether to force CPU usage instead of hardware acceleration. When False, the service automatically detects and uses the best available device (Intel XPU, CUDA, MPS, or CPU).

Usage

Basic Setup

from pipecat.services.moondream import MoondreamService

vision = MoondreamService()

With Custom Model and CPU Override

vision = MoondreamService(
    model="vikhyatk/moondream2",
    revision="2025-01-09",
    use_cpu=True,
)

Notes

  • First-run download: The model is automatically downloaded from Hugging Face on first use. Ensure sufficient disk space and network access.
  • Hardware auto-detection: When use_cpu=False (the default), the service detects available hardware in this priority order: Intel XPU, NVIDIA CUDA, Apple Metal (MPS), then CPU.
  • Data types: CUDA and MPS use float16 for faster inference, while XPU and CPU use float32.
  • Blocking inference: Image analysis runs in a separate thread via asyncio.to_thread to avoid blocking the event loop.