A guide to working with Pipecat’s Context and Context Aggregators
In Pipecat, context refers to the text that the LLM uses to perform an inference. Commonly, this is the text inputted to the LLM and outputted from the LLM. The context consists of a list of alternating user/assistant messages that represents the information you want an LLM to respond to. Since Pipecat is a real-time voice (and multimodal) AI framework, the context serves as the collective history of the entire conversation.
After every user and bot turn in the conversation, processors in the pipeline push frames to update the context:
TranscriptionFrame
objects that represent what the user says.LLMTextFrame
s) to the TTS service, which outputs TTSTextFrame
s representing the bot’s spoken words.Pipecat includes a context aggregator class that creates and manages context for both user and assistant messages. Here’s how to set it up:
The context (which represents the conversation) is passed to the context aggregator. This ensures that both user and assistant instances of the context aggregators have access to the shared conversation context.
The placement of context aggregator instances in your pipeline is crucial for proper operation:
Place the user context aggregator downstream from the STT service. Since the user’s speech results in TranscriptionFrame
objects pushed by the STT service, the user aggregator needs to be positioned to collect these frames.
Place the assistant context aggregator after transport.output()
. This positioning is important because:
Always place the assistant context aggregator after transport.output()
to ensure proper word-level context updates during interruptions.
You can programmatically add new messages to the context by pushing or queueing specific frames:
LLMMessagesAppendFrame
: Appends a new message to the existing contextLLMMessagesUpdateFrame
: Completely replaces the existing context with new context provided in the frameThe context aggregator provides a get_context_frame()
method to obtain the latest context:
You’ll commonly use this manual mechanism—obtaining the current context and pushing/queueing it—to trigger the bot to speak in two scenarios:
LLMMessagesAppendFrame
or LLMMessagesUpdateFrame
This gives you fine-grained control over when and how the bot responds during the conversation flow.