AudioBufferProcessor

Overview

The AudioBufferProcessor captures and buffers audio frames from both input (user) and output (bot) sources during conversations. It provides synchronized audio streams with configurable sample rates, supports both mono and stereo output, and offers flexible event handlers for various audio processing workflows.

Constructor

AudioBufferProcessor(
    sample_rate=None,
    num_channels=1,
    buffer_size=0,
    enable_turn_audio=False,
    **kwargs
)

Parameters

sample_rate

Optional[int]

default:"None"

The desired output sample rate in Hz. If None, uses the transport’s sample rate from the StartFrame.

num_channels

int

default:"1"

Number of output audio channels:

1: Mono output (user and bot audio are mixed together)
2: Stereo output (user audio on left channel, bot audio on right channel)

buffer_size

int

default:"0"

Buffer size in bytes that triggers audio data events:

0: Events only trigger when recording stops
>0: Events trigger whenever buffer reaches this size (useful for chunked processing)

enable_turn_audio

bool

default:"False"

Whether to enable per-turn audio event handlers (on_user_turn_audio_data and on_bot_turn_audio_data).

Properties

sample_rate

@property
def sample_rate(self) -> int

The current sample rate of the audio processor in Hz.

num_channels

@property
def num_channels(self) -> int

The number of channels in the audio output (1 for mono, 2 for stereo).

Methods

start_recording()

async def start_recording()

Start recording audio from both user and bot sources. Initializes recording state and resets audio buffers.

stop_recording()

async def stop_recording()

Stop recording and trigger final audio data handlers with any remaining buffered audio.

has_audio()

def has_audio() -> bool

Check if both user and bot audio buffers contain data. Returns: True if both buffers contain audio data.

Event Handlers

The processor supports multiple event handlers for different audio processing workflows. Register handlers using the @processor.event_handler() decorator.

on_audio_data

Triggered when buffer_size is reached or recording stops, providing merged audio.

@audiobuffer.event_handler("on_audio_data")
async def on_audio_data(buffer, audio: bytes, sample_rate: int, num_channels: int):
    # Handle merged audio data
    pass

Parameters:

buffer: The AudioBufferProcessor instance
audio: Merged audio data (format depends on num_channels setting)
sample_rate: Sample rate in Hz
num_channels: Number of channels (1 or 2)

on_track_audio_data

Triggered alongside on_audio_data, providing separate user and bot audio tracks.

@audiobuffer.event_handler("on_track_audio_data")
async def on_track_audio_data(buffer, user_audio: bytes, bot_audio: bytes,
                             sample_rate: int, num_channels: int):
    # Handle separate audio tracks
    pass

Parameters:

buffer: The AudioBufferProcessor instance
user_audio: Raw user audio bytes (always mono)
bot_audio: Raw bot audio bytes (always mono)
sample_rate: Sample rate in Hz
num_channels: Always 1 for individual tracks

on_user_turn_audio_data

Triggered when a user speaking turn ends. Requires enable_turn_audio=True.

@audiobuffer.event_handler("on_user_turn_audio_data")
async def on_user_turn_audio_data(buffer, audio: bytes, sample_rate: int, num_channels: int):
    # Handle user turn audio
    pass

Parameters:

buffer: The AudioBufferProcessor instance
audio: Audio data from the user’s speaking turn
sample_rate: Sample rate in Hz
num_channels: Always 1 (mono)

on_bot_turn_audio_data

Triggered when a bot speaking turn ends. Requires enable_turn_audio=True.

@audiobuffer.event_handler("on_bot_turn_audio_data")
async def on_bot_turn_audio_data(buffer, audio: bytes, sample_rate: int, num_channels: int):
    # Handle bot turn audio
    pass

Parameters:

buffer: The AudioBufferProcessor instance
audio: Audio data from the bot’s speaking turn
sample_rate: Sample rate in Hz
num_channels: Always 1 (mono)

Audio Processing Features

Automatic resampling: Converts incoming audio to the specified sample rate
Buffer synchronization: Aligns user and bot audio streams temporally
Silence insertion: Fills gaps in non-continuous audio streams to maintain timing
Turn tracking: Monitors speaking turns when enable_turn_audio=True

Integration Notes

STT Audio Passthrough

If using an STT service in your pipeline, enable audio passthrough to make audio available to the AudioBufferProcessor:

stt = DeepgramSTTService(
    api_key=os.getenv("DEEPGRAM_API_KEY"),
    audio_passthrough=True,
)

audio_passthrough is enabled by default.

Pipeline Placement

Add the AudioBufferProcessor after transport.output() to capture both user and bot audio:

pipeline = Pipeline([
    transport.input(),
    # ... other processors ...
    transport.output(),
    audiobuffer,  # Place after audio output
    # ... remaining processors ...
])

API Reference

Services

Utilities

Frameworks

Pipeline

AudioBufferProcessor

Overview

Constructor

Parameters

Properties

sample_rate

num_channels

Methods

start_recording()

stop_recording()

has_audio()

Event Handlers

on_audio_data

on_track_audio_data

on_user_turn_audio_data

on_bot_turn_audio_data

Audio Processing Features

Integration Notes

STT Audio Passthrough

Pipeline Placement

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

​Constructor

​Parameters

​Properties

​sample_rate

​num_channels

​Methods

​start_recording()

​stop_recording()

​has_audio()

​Event Handlers

​on_audio_data

​on_track_audio_data

​on_user_turn_audio_data

​on_bot_turn_audio_data

​Audio Processing Features

​Integration Notes

​STT Audio Passthrough

​Pipeline Placement

Overview

Constructor

Parameters

Properties

sample_rate

num_channels

Methods

start_recording()

stop_recording()

has_audio()

Event Handlers

on_audio_data

on_track_audio_data

on_user_turn_audio_data

on_bot_turn_audio_data

Audio Processing Features

Integration Notes

STT Audio Passthrough

Pipeline Placement