FIRE RED VAD

Overview

FireVadAnalyzer is a Pipecat VAD analyzer backed by FireRedVAD, a streaming voice activity detection model that supports 100+ languages. It processes audio one 10 ms frame at a time and reports speech probability to Pipecat’s VAD layer, letting transports detect when a user starts and stops speaking.

Source Repository

Source code, examples, and issues for the FireRedVAD integration

PyPI Package

The pipecat-firered-vad package on PyPI

FireRedVAD Model

The upstream FireRedVAD model and benchmarks

Model Weights

Download the FireRedVAD model weights from Hugging Face

Installation

This is a community-maintained package distributed separately from pipecat-ai:

pip install pipecat-firered-vad

Prerequisites

This integration requires no API key. It does, however, depend on the upstream FireRedVAD package (not published to PyPI) and locally downloaded model weights.

1. Install FireRedVAD

fireredvad is not on PyPI. Clone and install it from GitHub:

git clone https://github.com/FireRedTeam/FireRedVAD.git
cd FireRedVAD
pip install -r requirements.txt
export PYTHONPATH=$PWD:$PYTHONPATH

2. Download model weights

pip install -U "huggingface_hub[cli]"
huggingface-cli download FireRedTeam/FireRedVAD \
    --local-dir ./pretrained_models/FireRedVAD

3. Audio requirements

FireRedVAD only accepts 16 kHz, 16-bit mono PCM audio (enforced at construction time). When using a transport such as DailyTransport, set sample_rate=16000.

Environment Variables

The integration does not read environment variables directly. The example uses the following for convenience:

FIREREDVAD_MODEL_DIR: Path to the downloaded Stream-VAD model directory, passed to the analyzer’s model_dir argument.
FIREREDVAD_USE_GPU: Set to 1 to enable GPU inference (default: 0).

Configuration

Constructor parameters for FireVadAnalyzer (all keyword-only):

str

required

Path to the downloaded Stream-VAD model directory, e.g. "pretrained_models/FireRedVAD/Stream-VAD".

int

default:"None"

Audio sample rate in Hz. Must be 16000 if provided (enforced).

VADParams

default:"None"

Pipecat-level VAD parameters controlling turn-detection smoothing (confidence, start_secs, stop_secs).

int

default:"None"

Optional VadMode sensitivity preset (0–3). When set, it overrides the individual threshold/frame parameters below. See VAD modes.

bool

default:"False"

Run DFSMN inference on GPU (requires CUDA).

int

default:"5"

Frames in the model’s internal sliding-window smoother. Larger values reduce jitter at the cost of slightly more onset latency.

float

default:"0.4"

Model-level gate. Frames with a smoothed probability above this value are considered speech. Range 0.0–1.0.

int

default:"5"

Extra frames prepended at speech onset to avoid clipping the leading edge of a word.

int

default:"8"

Minimum consecutive speech frames before a segment is confirmed. Prevents single-frame false positives.

int

default:"2000"

Maximum frames in one speech segment before a forced split.

int

default:"20"

Silence frames required to close a speech segment. Higher values make the bot wait longer before deciding the turn ended.

VAD modes

VadMode provides pre-tuned sensitivity presets. Passing one to the mode argument adjusts speech_threshold, min_speech_frame, and min_silence_frame together as a matched set.

Preset	Value	Description
`VadMode.VERY_PERMISSIVE`	`0`	Catches soft/distant speech. May increase false alarms.
`VadMode.PERMISSIVE`	`1`	Balanced — a good starting point for most use cases.
`VadMode.AGGRESSIVE`	`2`	Suppresses background noise well. May clip quiet speech.
`VadMode.VERY_AGGRESSIVE`	`3`	Maximum noise rejection. Best for loud environments.

Usage

Pass the analyzer to a transport via vad_analyzer, the same way you would use SileroVADAnalyzer:

import os

from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat_firered_vad import FireVadAnalyzer, VadMode

vad = FireVadAnalyzer(
    model_dir=os.environ["FIREREDVAD_MODEL_DIR"],
    sample_rate=16000,
    params=VADParams(
        confidence=0.7,
        start_secs=0.2,
        stop_secs=0.3,
    ),
    mode=VadMode.PERMISSIVE,
    use_gpu=os.getenv("FIREREDVAD_USE_GPU", "0") == "1",
)

transport = DailyTransport(
    os.environ["DAILY_ROOM_URL"],
    os.getenv("DAILY_TOKEN"),
    "FireRed VAD Bot",
    DailyParams(
        audio_in_enabled=True,
        audio_out_enabled=True,
        vad_enabled=True,
        vad_analyzer=vad,
        vad_audio_passthrough=True,
    ),
)

# ... build your pipeline with transport.input() / transport.output().

Call vad.reset() between sessions (for example on on_participant_left) so one caller’s audio context does not bleed into the next.

Compatibility

Requires pipecat-ai >= 0.0.90. Check the source repository for the latest tested version and changelog.

Pipecat Server

Client SDKs

Pipecat Flows

Pipecat Cloud

CLI

Pipecat Context Hub

Overview

Source Repository

PyPI Package

FireRedVAD Model

Model Weights

Installation

Prerequisites

1. Install FireRedVAD

2. Download model weights

3. Audio requirements

Environment Variables

Configuration

VAD modes

Usage

Compatibility

​Overview

Source Repository

PyPI Package

FireRedVAD Model

Model Weights

​Installation

​Prerequisites

​1. Install FireRedVAD

​2. Download model weights

​3. Audio requirements

​Environment Variables

​Configuration

​VAD modes

​Usage

​Compatibility

Overview

Installation

Prerequisites

1. Install FireRedVAD

2. Download model weights

3. Audio requirements

Environment Variables

Configuration

VAD modes

Usage

Compatibility