Overview

Recording audio from conversations provides valuable data for analysis, debugging, and quality control. Pipecat’s AudioBufferProcessor makes it easy to capture high-quality audio recordings of both the user and bot during interactions.

How It Works

The AudioBufferProcessor captures audio by:

  1. Collecting audio frames from both the user (input) and bot (output)
  2. Emitting events with recorded audio data
  3. Providing options for composite or separate track recordings

Add the processor to your pipeline after the transport.output() to capture both the user audio and the bot audio as it’s spoken.

Audio Recording Options

The AudioBufferProcessor offers several configuration options:

  • Composite recording: Combined audio from both user and bot
  • Track-level recording: Separate audio files for user and bot
  • Turn-based recording: Individual audio clips for each speaking turn
  • Mono or stereo output: Single channel mixing or two-channel separation

Basic Implementation

Step 1: Create an Audio Buffer Processor

Initialize the audio buffer processor with your desired configuration:

from pipecat.processors.audio.audio_buffer_processor import AudioBufferProcessor

# Create audio buffer processor with default settings
audiobuffer = AudioBufferProcessor(
    num_channels=1,               # 1 for mono, 2 for stereo (user left, bot right)
    enable_turn_audio=False,      # Enable per-turn audio recording
    user_continuous_stream=True,  # User has continuous audio stream
)

Step 2: Add to Your Pipeline

Place the processor in your pipeline after all audio-producing components:

pipeline = Pipeline(
    [
        transport.input(),
        stt,
        context_aggregator.user(),
        llm,
        tts,
        transport.output(),
        audiobuffer,          # Add after all audio components
        context_aggregator.assistant(),
    ]
)

Step 3: Start Recording

Explicitly start recording when needed, typically when a session begins:

@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
    logger.info(f"Client connected")
    # Important: Start recording explicitly
    await audiobuffer.start_recording()
    # Continue with session initialization...

You must call start_recording() explicitly to begin capturing audio. The processor won’t record automatically when initialized.

Step 4: Handle Audio Data

Register an event handler to process audio data:

@audiobuffer.event_handler("on_audio_data")
async def on_audio_data(buffer, audio, sample_rate, num_channels):
    # Save or process the composite audio
    timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
    filename = f"recordings/conversation_{timestamp}.wav"

    # Create the WAV file
    with wave.open(filename, "wb") as wf:
        wf.setnchannels(num_channels)
        wf.setsampwidth(2)  # 16-bit audio
        wf.setframerate(sample_rate)
        wf.writeframes(audio)

    logger.info(f"Saved recording to {filename}")

If recording separate tracks, you can use the on_track_audio_data event handler to save user and bot audio separately.

Next Steps

Consider implementing audio recording in your application for quality assurance, training data collection, or creating conversation archives. The recorded audio can be stored locally, uploaded to cloud storage, or processed in real-time for further analysis.