AudioBufferProcessor
Process and buffer audio frames from conversations with flexible event handling
Overview
The AudioBufferProcessor
captures and buffers audio frames from both input (user) and output (bot) sources during conversations. It provides synchronized audio streams with configurable sample rates, supports both mono and stereo output, and offers flexible event handlers for various audio processing workflows.
Constructor
Parameters
The desired output sample rate in Hz. If None
, uses the transport’s sample
rate from the StartFrame
.
Number of output audio channels:
1
: Mono output (user and bot audio are mixed together)2
: Stereo output (user audio on left channel, bot audio on right channel)
Buffer size in bytes that triggers audio data events:
0
: Events only trigger when recording stops>0
: Events trigger whenever buffer reaches this size (useful for chunked processing)
Whether to enable per-turn audio event handlers (on_user_turn_audio_data
and
on_bot_turn_audio_data
).
Properties
sample_rate
The current sample rate of the audio processor in Hz.
num_channels
The number of channels in the audio output (1 for mono, 2 for stereo).
Methods
start_recording()
Start recording audio from both user and bot sources. Initializes recording state and resets audio buffers.
stop_recording()
Stop recording and trigger final audio data handlers with any remaining buffered audio.
has_audio()
Check if both user and bot audio buffers contain data.
Returns: True
if both buffers contain audio data.
Event Handlers
The processor supports multiple event handlers for different audio processing workflows. Register handlers using the @processor.event_handler()
decorator.
on_audio_data
Triggered when buffer_size
is reached or recording stops, providing merged audio.
Parameters:
buffer
: The AudioBufferProcessor instanceaudio
: Merged audio data (format depends onnum_channels
setting)sample_rate
: Sample rate in Hznum_channels
: Number of channels (1 or 2)
on_track_audio_data
Triggered alongside on_audio_data
, providing separate user and bot audio tracks.
Parameters:
buffer
: The AudioBufferProcessor instanceuser_audio
: Raw user audio bytes (always mono)bot_audio
: Raw bot audio bytes (always mono)sample_rate
: Sample rate in Hznum_channels
: Always 1 for individual tracks
on_user_turn_audio_data
Triggered when a user speaking turn ends. Requires enable_turn_audio=True
.
Parameters:
buffer
: The AudioBufferProcessor instanceaudio
: Audio data from the user’s speaking turnsample_rate
: Sample rate in Hznum_channels
: Always 1 (mono)
on_bot_turn_audio_data
Triggered when a bot speaking turn ends. Requires enable_turn_audio=True
.
Parameters:
buffer
: The AudioBufferProcessor instanceaudio
: Audio data from the bot’s speaking turnsample_rate
: Sample rate in Hznum_channels
: Always 1 (mono)
Audio Processing Features
- Automatic resampling: Converts incoming audio to the specified sample rate
- Buffer synchronization: Aligns user and bot audio streams temporally
- Silence insertion: Fills gaps in non-continuous audio streams to maintain timing
- Turn tracking: Monitors speaking turns when
enable_turn_audio=True
Integration Notes
STT Audio Passthrough
If using an STT service in your pipeline, enable audio passthrough to make audio available to the AudioBufferProcessor:
audio_passthrough
is enabled by default.Pipeline Placement
Add the AudioBufferProcessor after transport.output()
to capture both user and bot audio: