Audio Recording
Record and buffer audio from conversations
Overview
The AudioBufferProcessor
captures and buffers audio from both input (user) and output (bot) sources during conversations. It synchronizes these audio streams, supports both mono and stereo recording with configurable sample rates, and provides flexible event handlers for various audio processing needs. The processor can operate in continuous or speech-only recording modes and produce either combined or separate audio tracks.
Usage
To record audio, create an instance of AudioBufferProcessor
and add it to your pipeline:
STT Audio Passthrough
If you have an STT service in your pipeline, you will need to pass through the audio so that it’s available to the AudioBufferProcessor. You can do this by adding audio_passthrough=True
to the STT service:
Configuration Options
The desired output sample rate in Hz. If not specified, uses the transport’s sample rate.
Number of audio channels:
1
: Mono (mixed user and bot audio)2
: Stereo (user audio in left channel, bot audio in right channel)
Size in bytes that triggers the on_audio_data
event:
0
: Only triggers when recording stops>0
: Triggers whenever buffer reaches this size
Whether user audio is continuous or speech-only:
True
: Expects continuous audio streamFalse
: Expects only speech segments with silence between them
Whether to enable separate event handling for user and bot turns:
True
: Triggers per-turn audio eventsFalse
: Only triggers combined audio events
Recording Controls
Start Recording
Begin recording audio from the conversation:
Stop Recording
Stop the current recording session:
Event Handlers
The processor supports multiple event handlers for different audio processing needs:
on_audio_data
Triggered when buffer_size is reached or recording stops, providing merged audio:
on_track_audio_data
Triggered alongside on_audio_data, providing separate user and bot audio tracks:
on_user_turn_audio_data
Triggered when a user speaking turn ends, providing that turn’s audio:
on_bot_turn_audio_data
Triggered when a bot speaking turn ends, providing that turn’s audio:
Audio Processing Features
- Automatic resampling of audio to specified sample rate
- Buffer synchronization between user and bot audio streams
- Silence insertion for non-continuous audio streams
- Support for both continuous and speech-only recording modes
- Separate tracking of user and bot speaking turns
- Stereo channel separation for user and bot audio