Overview
Recording audio from conversations provides valuable data for analysis, debugging, and quality control. You have two options for how to record with Pipecat:Option 1: Record using your transport service provider
Record without writing custom code by using your transport provider’s recording capabilities. In addition to saving you development time, some providers offer unique recording capabilities.Refer to your service provider’s documentation to learn more.
Option 2: Create your own recording pipeline
Pipecat’sAudioBufferProcessor makes it easy to capture high-quality audio recordings of both the user and bot during interactions. Opt for this approach if you want more control over your recording.
This guide focuses on how to recording using the AudioBufferProcessor, including high-level guidance for how to set up post-processing jobs for longer recordings.
How the AudioBufferProcessor Works
TheAudioBufferProcessor captures audio by:
- Collecting audio frames from both the user (input) and bot (output)
- Emitting events with recorded audio data
- Providing options for composite or separate track recordings
Add the processor to your pipeline after the
transport.output() to capture
both the user audio and the bot audio as it’s spoken.Audio Recording Options
TheAudioBufferProcessor offers several configuration options:
- Composite recording: Combined audio from both user and bot
on_audio_dataevent handler
- Track-level recording: Separate audio files for user and bot
on_track_audio_dataevent handler
- Turn-based recording: Individual audio clips for each speaking turn
on_user_turn_audio_dataandon_bot_turn_audio_dataevent handlers
- Mono or stereo output: Single channel mixing or two-channel separation
num_channels=1for mono;num_channels=2for stereo
Basic Implementation
Step 1: Create an Audio Buffer Processor
Initialize the audio buffer processor with your desired configuration:Step 2: Add to Your Pipeline
Place the processor in your pipeline after all audio-producing components:Step 3: Start Recording
Explicitly start recording when needed, typically when a session begins:Step 4: Handle Audio Data
Register an event handler to process audio data:Recording Longer Conversations
For conversations that last a few minutes, it may be sufficient to just buffer the audio in memory. However, for longer sessions, storing audio in memory poses two challenges:- Memory Usage: Long recordings can consume significant memory, leading to potential crashes or performance issues.
- Conversation Loss: If the application crashes or the connection drops, you may lose all recorded audio.
buffer_size parameter to record audio in manageable segments. This allows you to periodically save audio data to disk or upload it to cloud storage, reducing memory usage and ensuring data persistence.
See an example of how to upload chunked audio to AWS cloud storage here.
Chunked Recording
Set a reasonablebuffer_size to trigger periodic uploads:
Multipart Upload Strategy
For cloud storage, use multipart uploads to stream audio chunks. For example AWS cloud storage, use the s3 multipart upload API. If you are rolling your own multipart upload code, consider the following: Conceptual Approach:- Initialize multipart upload when recording starts
- Upload chunks as parts when buffers fill (every ~30 seconds)
- Complete multipart upload when recording ends
- Post-process to create final WAV file(s), concatenate audio chunks
- Memory efficient for long sessions
- Fault tolerant (no data loss if connection drops)
- Enables real-time processing and analysis
- Parallel upload of multiple tracks
[Optional] Post-Processing Pipeline
If not using a managed multipart upload framework like AWS s3 multipart upload, concatenate audio chunks together to create final audio files. This can be done with tools like FFmpeg: Concatenating Audio Files:- Use sequence numbers in chunk filenames for proper ordering
- Include metadata (sample rate, channels, duration) with each chunk
- Implement retry logic for failed uploads
- Consider using cloud functions/lambdas for automatic post-processing
Next Steps
Try the Audio Recording Example
Explore a complete working example that demonstrates how to record and save
both composite and track-level audio with Pipecat.
AudioBufferProcessor Reference
Read the complete API reference documentation for advanced configuration
options and event handlers.