Overview

SileroVADAnalyzer is a Voice Activity Detection (VAD) analyzer that uses the Silero VAD ONNX model to detect speech in audio streams. It provides high-accuracy speech detection with efficient processing using ONNX runtime.

Installation

The Silero VAD analyzer requires additional dependencies:

pip install pipecat-ai[silero]

Constructor Parameters

sample_rate
int
default: "16000"

Audio sample rate in Hz. Must be either 8000 or 16000.

params
VADParams
default: "VADParams()"

Voice Activity Detection parameters object

Usage Example

transport = DailyTransport(
    room_url,
    token,
    "Respond bot",
    DailyParams(
        vad_enabled=True,
        vad_analyzer=SileroVADAnalyzer(
            sample_rate=16000,
            params=VADParams(
                threshold=0.5,
                min_speech_duration_ms=250,
                min_silence_duration_ms=100
            )
        ),
        vad_audio_passthrough=True
    ),
)

Technical Details

Sample Rate Requirements

The analyzer supports two sample rates:

  • 8000 Hz (256 samples per frame)
  • 16000 Hz (512 samples per frame)

Model Management

  • Uses ONNX runtime for efficient inference
  • Automatically resets model state every 5 seconds to manage memory
  • Runs on CPU by default for consistent performance
  • Includes built-in model file

Notes

  • High-accuracy speech detection
  • Efficient ONNX-based processing
  • Automatic memory management
  • Thread-safe for pipeline processing
  • Built-in model file included
  • CPU-optimized inference
  • Supports 8kHz and 16kHz audio