Overview

The AudioBufferProcessor captures and buffers audio frames from both input (user) and output (bot) sources during conversations. It provides synchronized audio streams with configurable sample rates, supports both mono and stereo output, and offers flexible event handlers for various audio processing workflows.

Constructor

AudioBufferProcessor(
    sample_rate=None,
    num_channels=1,
    buffer_size=0,
    enable_turn_audio=False,
    **kwargs
)

Parameters

sample_rate
Optional[int]
default:"None"

The desired output sample rate in Hz. If None, uses the transport’s sample rate from the StartFrame.

num_channels
int
default:"1"

Number of output audio channels:

  • 1: Mono output (user and bot audio are mixed together)
  • 2: Stereo output (user audio on left channel, bot audio on right channel)
buffer_size
int
default:"0"

Buffer size in bytes that triggers audio data events:

  • 0: Events only trigger when recording stops
  • >0: Events trigger whenever buffer reaches this size (useful for chunked processing)
enable_turn_audio
bool
default:"False"

Whether to enable per-turn audio event handlers (on_user_turn_audio_data and on_bot_turn_audio_data).

Properties

sample_rate

@property
def sample_rate(self) -> int

The current sample rate of the audio processor in Hz.

num_channels

@property
def num_channels(self) -> int

The number of channels in the audio output (1 for mono, 2 for stereo).

Methods

start_recording()

async def start_recording()

Start recording audio from both user and bot sources. Initializes recording state and resets audio buffers.

stop_recording()

async def stop_recording()

Stop recording and trigger final audio data handlers with any remaining buffered audio.

has_audio()

def has_audio() -> bool

Check if both user and bot audio buffers contain data.

Returns: True if both buffers contain audio data.

Event Handlers

The processor supports multiple event handlers for different audio processing workflows. Register handlers using the @processor.event_handler() decorator.

on_audio_data

Triggered when buffer_size is reached or recording stops, providing merged audio.

@audiobuffer.event_handler("on_audio_data")
async def on_audio_data(buffer, audio: bytes, sample_rate: int, num_channels: int):
    # Handle merged audio data
    pass

Parameters:

  • buffer: The AudioBufferProcessor instance
  • audio: Merged audio data (format depends on num_channels setting)
  • sample_rate: Sample rate in Hz
  • num_channels: Number of channels (1 or 2)

on_track_audio_data

Triggered alongside on_audio_data, providing separate user and bot audio tracks.

@audiobuffer.event_handler("on_track_audio_data")
async def on_track_audio_data(buffer, user_audio: bytes, bot_audio: bytes,
                             sample_rate: int, num_channels: int):
    # Handle separate audio tracks
    pass

Parameters:

  • buffer: The AudioBufferProcessor instance
  • user_audio: Raw user audio bytes (always mono)
  • bot_audio: Raw bot audio bytes (always mono)
  • sample_rate: Sample rate in Hz
  • num_channels: Always 1 for individual tracks

on_user_turn_audio_data

Triggered when a user speaking turn ends. Requires enable_turn_audio=True.

@audiobuffer.event_handler("on_user_turn_audio_data")
async def on_user_turn_audio_data(buffer, audio: bytes, sample_rate: int, num_channels: int):
    # Handle user turn audio
    pass

Parameters:

  • buffer: The AudioBufferProcessor instance
  • audio: Audio data from the user’s speaking turn
  • sample_rate: Sample rate in Hz
  • num_channels: Always 1 (mono)

on_bot_turn_audio_data

Triggered when a bot speaking turn ends. Requires enable_turn_audio=True.

@audiobuffer.event_handler("on_bot_turn_audio_data")
async def on_bot_turn_audio_data(buffer, audio: bytes, sample_rate: int, num_channels: int):
    # Handle bot turn audio
    pass

Parameters:

  • buffer: The AudioBufferProcessor instance
  • audio: Audio data from the bot’s speaking turn
  • sample_rate: Sample rate in Hz
  • num_channels: Always 1 (mono)

Audio Processing Features

  • Automatic resampling: Converts incoming audio to the specified sample rate
  • Buffer synchronization: Aligns user and bot audio streams temporally
  • Silence insertion: Fills gaps in non-continuous audio streams to maintain timing
  • Turn tracking: Monitors speaking turns when enable_turn_audio=True

Integration Notes

STT Audio Passthrough

If using an STT service in your pipeline, enable audio passthrough to make audio available to the AudioBufferProcessor:

stt = DeepgramSTTService(
    api_key=os.getenv("DEEPGRAM_API_KEY"),
    audio_passthrough=True,
)
audio_passthrough is enabled by default.

Pipeline Placement

Add the AudioBufferProcessor after transport.output() to capture both user and bot audio:

pipeline = Pipeline([
    transport.input(),
    # ... other processors ...
    transport.output(),
    audiobuffer,  # Place after audio output
    # ... remaining processors ...
])