Skip to main content

Overview

WhisperSTTService provides offline speech recognition using OpenAI’s Whisper models running locally. Supports multiple model sizes and hardware acceleration options including CPU, CUDA, and Apple Silicon (MLX) for privacy-focused transcription without external API calls.

Installation

Choose your installation based on your hardware:

Standard Whisper (CPU/CUDA)

pip install "pipecat-ai[whisper]"

MLX Whisper (Apple Silicon)

pip install "pipecat-ai[mlx-whisper]"

Prerequisites

Local Model Setup

Before using Whisper STT services, you need:
  1. Model Selection: Choose appropriate Whisper model size (tiny, base, small, medium, large)
  2. Hardware Configuration: Set up CPU, CUDA, or Apple Silicon acceleration
  3. Storage Space: Ensure sufficient disk space for model downloads

Configuration Options

  • Model Size: Balance between accuracy and performance based on your hardware
  • Hardware Acceleration: Configure CUDA for NVIDIA GPUs or MLX for Apple Silicon
  • Language Support: Whisper supports 99+ languages out of the box
No API keys required - Whisper runs entirely locally for complete privacy.
I