Skip to main content

Overview

Frames are the fundamental units of data in Pipecat. Every piece of information that moves through a pipeline — audio, text, images, control signals — is wrapped in a frame. Frame processors receive frames, act on them, and push new or modified frames along to the next processor. All frames inherit from the base Frame class and are Python dataclasses.

Frame Categories

Pipecat has three base frame types, each with different processing behavior:
Base TypeProcessingInterruption Behavior
DataFrameQueued, processed in order with non-SystemFramesCancelled on user interruption
ControlFrameQueued, processed in order with non-SystemFramesCancelled on user interruption
SystemFrameHigher priority, queued, processed in order with SystemFramesNot cancelled on user interruption

DataFrame

Data frames carry the main content flowing through a pipeline: audio chunks, text, images, and LLM messages. They are queued and processed in order with other DataFrames and ControlFrames. If a user interrupts (starts speaking while the bot is responding), any pending data frames are discarded so the new input can be handled immediately. Examples: TextFrame, OutputAudioRawFrame, LLMMessagesAppendFrame, TTSSpeakFrame

ControlFrame

ControlFrames signal processing boundaries and configuration changes: response start/end markers, settings updates, and state transitions. They are queued and processed in order alongside DataFrames, and like DataFrames, any pending ControlFrames are discarded when a user interrupts unless combined with UninterruptibleFrame. Examples: EndFrame, LLMFullResponseStartFrame, TTSStartedFrame, ServiceUpdateSettingsFrame

SystemFrame

SystemFrames are high-priority signals that must always be delivered: interruptions, user input, error notifications, and pipeline lifecycle events. They are queued and processed in order with other SystemFrames. Unlike DataFrames and ControlFrames, they are never discarded when a user interrupts. Examples: StartFrame, CancelFrame, InterruptionFrame., UserStartedSpeakingFrame, InputAudioRawFrame

Frame Properties

Every frame has these properties set automatically:
id
int
Unique identifier for the frame instance.
name
str
Human-readable name combining class name and instance count (e.g., TextFrame#3). Useful for debugging.
pts
Optional[int]
Presentation timestamp in nanoseconds. Used for audio/video synchronization.
metadata
Dict[str, Any]
Dictionary for arbitrary frame metadata.
transport_source
Optional[str]
Name of the transport source that created this frame.
transport_destination
Optional[str]
Name of the transport destination for this frame. Used when a transport supports multiple output tracks.

Frame Direction

Frames flow through the pipeline in one of two directions:
from pipecat.processors.frame_processor import FrameDirection

class FrameDirection(Enum):
    DOWNSTREAM = 1  # Input → Output (default)
    UPSTREAM = 2    # Output → Input
Downstream is the default. In a typical voice AI pipeline, audio enters from the transport input, gets transcribed, runs through the LLM, converts to speech, and reaches the transport output. Upstream lets processors send information back toward the start of the pipeline. The most common example: the assistant context aggregator at the end of the pipeline pushes context frames upstream so they flow back to the LLM.

Pushing Frames

Within a frame processor, call push_frame() to send a frame to the next processor:
# Push downstream (default)
await self.push_frame(frame, FrameDirection.DOWNSTREAM)

# Push upstream
await self.push_frame(frame, FrameDirection.UPSTREAM)

Broadcasting Frames

To send a frame in both directions simultaneously, use broadcast_frame():
# Create and push instances upstream and downstream
await self.broadcast_frame(UserStartedSpeakingFrame)
Each direction receives its own frame instance, linked by broadcast_sibling_id. To broadcast an existing frame instance (when you are not the original creator of the frame), use broadcast_frame_instance():
# Broadcast an existing frame instance in both directions
await self.broadcast_frame_instance(frame)
This creates two new instances by shallow-copying all fields from the original frame except id and name, which get fresh values.
Prefer broadcast_frame() when possible, as it is more efficient.

Mixins

Mixins add cross-cutting behavior or shared data fields to frames without changing their base type.

UninterruptibleFrame

Occasionally a DataFrame or ControlFrame is too important to discard during an interruption. Adding the UninterruptibleFrame mixin protects it: the frame stays in internal queues and any task processing it will not be cancelled.
@dataclass
class FunctionCallResultFrame(DataFrame, UninterruptibleFrame):
    """Must be delivered even if the user interrupts."""
    ...
Examples: EndFrame, StopFrame, FunctionCallResultFrame, FunctionCallInProgressFrame

AudioRawFrame

Carries raw audio fields shared by both input and output audio frames.
audio
bytes
Raw audio bytes in PCM format.
sample_rate
int
Audio sample rate in Hz (e.g., 16000).
num_channels
int
Number of audio channels (e.g., 1 for mono).
num_frames
int
Number of audio frames. Calculated automatically from the audio data.

ImageRawFrame

Carries raw image fields shared by both input and output image frames.
image
bytes
Raw image bytes.
size
Tuple[int, int]
Image dimensions as (width, height).
format
Optional[str]
Image format (e.g., "RGB", "RGBA").

Common Patterns

Pipecat prefers pushing frames over calling methods directly between processors. Routing data through the pipeline as frames ensures correct processing order, which is critical for real-time use cases. Most frames are produced and consumed by Pipecat’s built-in services. The patterns below cover the frames you’re most likely to push yourself in application code.

Starting a Conversation

Add an initial message to the context, then push LLMRunFrame to kick off processing:
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
    context.add_message({"role": "user", "content": "Please introduce yourself."})
    await task.queue_frames([LLMRunFrame()])

Injecting a Prompt

LLMMessagesAppendFrame adds messages to the context without replacing what’s already there. Set run_llm=True to trigger a response immediately:
message = {
    "role": "user",
    "content": "The user has been quiet. Ask if they're still there.",
}
await aggregator.push_frame(LLMMessagesAppendFrame([message], run_llm=True))

Speaking Without the LLM

TTSSpeakFrame sends text directly to the TTS service as a standalone utterance, bypassing the LLM entirely:
@llm.event_handler("on_function_calls_started")
async def on_function_calls_started(service, function_calls):
    await tts.queue_frame(TTSSpeakFrame("Let me check on that."))

Ending a Conversation

Push EndTaskFrame upstream to gracefully shut down the pipeline. Pair it with a TTSSpeakFrame to say goodbye first:
await aggregator.push_frame(
    TTSSpeakFrame("It seems like you're busy. Have a nice day!")
)
await aggregator.push_frame(EndTaskFrame(), FrameDirection.UPSTREAM)

Changing Service Settings at Runtime

Push settings frames to adjust LLM, TTS, or STT configuration mid-conversation:
await task.queue_frame(
    LLMUpdateSettingsFrame(delta=OpenAILLMService.Settings(temperature=0.1))
)

Updating Tools at Runtime

Add or replace available function-calling tools while the conversation is active:
new_tools = ToolsSchema(
    standard_tools=[weather_function, restaurant_function]
)
await task.queue_frames([LLMSetToolsFrame(tools=new_tools)])

Playing Sound Effects

Load audio files and push OutputAudioRawFrame directly from a custom processor:
with wave.open("ding.wav") as f:
    ding = OutputAudioRawFrame(f.readframes(-1), f.getframerate(), f.getnchannels())

class SoundEffect(FrameProcessor):
    async def process_frame(self, frame, direction):
        await super().process_frame(frame, direction)
        if isinstance(frame, LLMFullResponseEndFrame):
            await self.push_frame(ding)
        await self.push_frame(frame, direction)

Reacting to LLM Response Boundaries

LLMFullResponseStartFrame and LLMFullResponseEndFrame bracket every LLM response. Custom processors can watch for these to trigger side effects:
class ResponseLogger(FrameProcessor):
    async def process_frame(self, frame, direction):
        await super().process_frame(frame, direction)
        if isinstance(frame, LLMFullResponseStartFrame):
            logger.info("LLM response started")
        elif isinstance(frame, LLMFullResponseEndFrame):
            logger.info("LLM response finished")
        await self.push_frame(frame, direction)

Frame Type Reference

The individual reference pages below document every frame class, organized by function:

Data Frames

Audio, image, text, transcription, and transport message frames that carry content through the pipeline.

Control Frames

Pipeline lifecycle, LLM response boundaries, TTS state, service settings, and filter/mixer configuration.

System Frames

Interruptions, user/bot speaking state, VAD events, errors, metrics, and raw input frames.

LLM Frames

LLM context frame, function calling helper dataclasses, and links to LLM-related frames on other pages.