Recording Conversation Audio

Overview

Recording audio from conversations provides valuable data for analysis, debugging, and quality control. You have two options for how to record with Pipecat:

Option 1: Record using your transport service provider

Record without writing custom code by using your transport provider’s recording capabilities. In addition to saving you development time, some providers offer unique recording capabilities.

Refer to your service provider’s documentation to learn more.

Option 2: Create your own recording pipeline

Pipecat’s AudioBufferProcessor makes it easy to capture high-quality audio recordings of both the user and bot during interactions. Opt for this approach if you want more control over your recording. This guide focuses on how to recording using the AudioBufferProcessor, including high-level guidance for how to set up post-processing jobs for longer recordings.

How the AudioBufferProcessor Works

The AudioBufferProcessor captures audio by:

Collecting audio frames from both the user (input) and bot (output)
Emitting events with recorded audio data
Providing options for composite or separate track recordings

Add the processor to your pipeline after the transport.output() to capture both the user audio and the bot audio as it’s spoken.

Audio Recording Options

The AudioBufferProcessor offers several configuration options:

Composite recording: Combined audio from both user and bot
Track-level recording: Separate audio files for user and bot
Turn-based recording: Individual audio clips for each speaking turn
Mono or stereo output: Single channel mixing or two-channel separation

Basic Implementation

Step 1: Create an Audio Buffer Processor

Initialize the audio buffer processor with your desired configuration:

from pipecat.processors.audio.audio_buffer_processor import AudioBufferProcessor

# Create audio buffer processor with default settings
audiobuffer = AudioBufferProcessor(
    num_channels=1,               # 1 for mono, 2 for stereo (user left, bot right)
    enable_turn_audio=False,      # Enable per-turn audio recording
    user_continuous_stream=True,  # User has continuous audio stream
)

Step 2: Add to Your Pipeline

Place the processor in your pipeline after all audio-producing components:

pipeline = Pipeline(
    [
        transport.input(),
        stt,
        context_aggregator.user(),
        llm,
        tts,
        transport.output(),
        audiobuffer,          # Add after all audio components
        context_aggregator.assistant(),
    ]
)

Step 3: Start Recording

Explicitly start recording when needed, typically when a session begins:

@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
    logger.info(f"Client connected")
    # Important: Start recording explicitly
    await audiobuffer.start_recording()
    # Continue with session initialization...

You must call start_recording() explicitly to begin capturing audio. The processor won’t record automatically when initialized.

Step 4: Handle Audio Data

@audiobuffer.event_handler("on_audio_data")
async def on_audio_data(buffer, audio, sample_rate, num_channels):
    # Save or process the composite audio
    timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
    filename = f"recordings/conversation_{timestamp}.wav"

    # Create the WAV file
    with wave.open(filename, "wb") as wf:
        wf.setnchannels(num_channels)
        wf.setsampwidth(2)  # 16-bit audio
        wf.setframerate(sample_rate)
        wf.writeframes(audio)

    logger.info(f"Saved recording to {filename}")

If recording separate tracks, you can use the on_track_audio_data event handler to save user and bot audio separately.

Recording Longer Conversations

For conversations that last a few minutes, it may be sufficient to just buffer the audio in memory. However, for longer sessions, storing audio in memory poses two challenges:

Memory Usage: Long recordings can consume significant memory, leading to potential crashes or performance issues.
Conversation Loss: If the application crashes or the connection drops, you may lose all recorded audio.

Instead, consider using a chunked approach to record audio in manageable segments. This allows you to periodically save audio data to disk or upload it to cloud storage, reducing memory usage and ensuring data persistence.

Chunked Recording

Set a reasonable buffer_size to trigger periodic uploads:

# 30-second chunks (recommended for most use cases)
SAMPLE_RATE = 24000
CHUNK_DURATION = 30  # seconds
audiobuffer = AudioBufferProcessor(
    sample_rate=SAMPLE_RATE,
    buffer_size=SAMPLE_RATE * 2 * CHUNK_DURATION  # 2 bytes per sample (16-bit)
)

chunk_counter = 0

@audiobuffer.event_handler("on_track_audio_data")
async def on_chunk_ready(buffer, user_audio, bot_audio, sample_rate, num_channels):
    global chunk_counter

    # Upload or save individual chunks
    await upload_audio_chunk(f"user_chunk_{chunk_counter:03d}.wav", user_audio, sample_rate, 1)
    await upload_audio_chunk(f"bot_chunk_{chunk_counter:03d}.wav", bot_audio, sample_rate, 1)

    chunk_counter += 1

Multipart Upload Strategy

For cloud storage, consider using multipart uploads to stream audio chunks: Conceptual Approach:

Initialize multipart upload when recording starts
Upload chunks as parts when buffers fill (every ~30 seconds)
Complete multipart upload when recording ends
Post-process to create final WAV file(s)

Benefits:

Memory efficient for long sessions
Fault tolerant (no data loss if connection drops)
Enables real-time processing and analysis
Parallel upload of multiple tracks

Post-Processing Pipeline

After uploading chunks, create final audio files using tools like FFmpeg: Concatenating Audio Files:

# Method 1: Simple concatenation (same format)
ffmpeg -i "concat:chunk_001.wav|chunk_002.wav|chunk_003.wav" -acodec copy final.wav

# Method 2: Using file list (recommended for many chunks)
# Create filelist.txt with format:
# file 'chunk_001.wav'
# file 'chunk_002.wav'
# ...
ffmpeg -f concat -safe 0 -i filelist.txt -c copy final_recording.wav

Automation Considerations:

Use sequence numbers in chunk filenames for proper ordering
Include metadata (sample rate, channels, duration) with each chunk
Implement retry logic for failed uploads
Consider using cloud functions/lambdas for automatic post-processing

Next Steps

Try the Audio Recording Example

Explore a complete working example that demonstrates how to record and save both composite and track-level audio with Pipecat.

AudioBufferProcessor Reference

Read the complete API reference documentation for advanced configuration options and event handlers.

Consider implementing audio recording in your application for quality assurance, training data collection, or creating conversation archives. The recorded audio can be stored locally, uploaded to cloud storage, or processed in real-time for further analysis.

Fundamentals

Features

Telephony

Deploying your bot

Recording Conversation Audio

Overview

Option 1: Record using your transport service provider

Option 2: Create your own recording pipeline

How the AudioBufferProcessor Works

Audio Recording Options

Basic Implementation

Step 1: Create an Audio Buffer Processor

Step 2: Add to Your Pipeline

Step 3: Start Recording

Step 4: Handle Audio Data

Recording Longer Conversations

Chunked Recording

Multipart Upload Strategy

Post-Processing Pipeline

Next Steps

Try the Audio Recording Example

AudioBufferProcessor Reference

Fundamentals

Features

Telephony

Deploying your bot

​Overview

​Option 1: Record using your transport service provider

​Option 2: Create your own recording pipeline

​How the AudioBufferProcessor Works

​Audio Recording Options

​Basic Implementation

​Step 1: Create an Audio Buffer Processor

​Step 2: Add to Your Pipeline

​Step 3: Start Recording

​Step 4: Handle Audio Data

​Recording Longer Conversations

​Chunked Recording

​Multipart Upload Strategy

​Post-Processing Pipeline

​Next Steps

Try the Audio Recording Example

AudioBufferProcessor Reference

Overview

Option 1: Record using your transport service provider

Option 2: Create your own recording pipeline

How the AudioBufferProcessor Works

Audio Recording Options

Basic Implementation

Step 1: Create an Audio Buffer Processor

Step 2: Add to Your Pipeline

Step 3: Start Recording

Step 4: Handle Audio Data

Recording Longer Conversations

Chunked Recording

Multipart Upload Strategy

Post-Processing Pipeline

Next Steps