Overview
MoondreamService provides local image analysis and question-answering capabilities using the Moondream model. It runs entirely on your local machine, supporting various hardware acceleration options including CUDA, Intel XPU, and Apple MPS for privacy-focused computer vision applications.
Moondream Vision API Reference
Pipecat’s API methods for Moondream vision integration
Example Implementation
Browse examples using Moondream vision
Moondream Documentation
Official Moondream model documentation
Hugging Face Model
Access Moondream model on Hugging Face
Installation
To use Moondream services, install the required dependencies:Prerequisites
Local Model Setup
Before using Moondream vision services, you need:- Model Download: First run will automatically download the Moondream model from Hugging Face
- Hardware Configuration: Set up CUDA, Intel XPU, or Apple MPS for optimal performance
- Storage Space: Ensure sufficient disk space for model files
- Memory Requirements: Adequate RAM/VRAM for model inference
Hardware Acceleration
The service automatically detects and uses the best available hardware:- Intel XPU: Requires intel_extension_for_pytorch
- NVIDIA CUDA: For GPU acceleration
- Apple Metal (MPS): For Apple Silicon optimization
- CPU: Fallback option for any system
Configuration Options
- Model Selection: Choose Moondream model version and revision
- Hardware Override: Force CPU usage if needed
- Local Processing: Complete privacy with no external API calls
Configuration
Hugging Face model identifier for the Moondream model.
Specific model revision to use.
Whether to force CPU usage instead of hardware acceleration. When
False, the service automatically detects and uses the best available device (Intel XPU, CUDA, MPS, or CPU).Usage
Basic Setup
With Custom Model and CPU Override
Notes
- First-run download: The model is automatically downloaded from Hugging Face on first use. Ensure sufficient disk space and network access.
- Hardware auto-detection: When
use_cpu=False(the default), the service detects available hardware in this priority order: Intel XPU, NVIDIA CUDA, Apple Metal (MPS), then CPU. - Data types: CUDA and MPS use
float16for faster inference, while XPU and CPU usefloat32. - Blocking inference: Image analysis runs in a separate thread via
asyncio.to_threadto avoid blocking the event loop.