Process and buffer audio frames from conversations with flexible event handling
AudioBufferProcessor
captures and buffers audio frames from both input (user) and output (bot) sources during conversations. It provides synchronized audio streams with configurable sample rates, supports both mono and stereo output, and offers flexible event handlers for various audio processing workflows.
None
, uses the transport’s sample
rate from the StartFrame
.1
: Mono output (user and bot audio are mixed together)2
: Stereo output (user audio on left channel, bot audio on right channel)0
: Events only trigger when recording stops>0
: Events trigger whenever buffer reaches this size (useful for chunked processing)on_user_turn_audio_data
and
on_bot_turn_audio_data
).True
if both buffers contain audio data.
@processor.event_handler()
decorator.
buffer_size
is reached or recording stops, providing merged audio.
buffer
: The AudioBufferProcessor instanceaudio
: Merged audio data (format depends on num_channels
setting)sample_rate
: Sample rate in Hznum_channels
: Number of channels (1 or 2)on_audio_data
, providing separate user and bot audio tracks.
buffer
: The AudioBufferProcessor instanceuser_audio
: Raw user audio bytes (always mono)bot_audio
: Raw bot audio bytes (always mono)sample_rate
: Sample rate in Hznum_channels
: Always 1 for individual tracksenable_turn_audio=True
.
buffer
: The AudioBufferProcessor instanceaudio
: Audio data from the user’s speaking turnsample_rate
: Sample rate in Hznum_channels
: Always 1 (mono)enable_turn_audio=True
.
buffer
: The AudioBufferProcessor instanceaudio
: Audio data from the bot’s speaking turnsample_rate
: Sample rate in Hznum_channels
: Always 1 (mono)enable_turn_audio=True
audio_passthrough
is enabled by default.transport.output()
to capture both user and bot audio: