A guide to working with Pipecat’s Context and Context Aggregators
InputAudioRawFrame
→ STT Service → TranscriptionFrame
context_aggregator.user()
receives TranscriptionFrame
and adds user message to contextLLMTextFrame
→ TTS Service → TTSTextFrame
context_aggregator.assistant()
receives TTSTextFrame
and adds assistant message to contextTranscriptionFrame
: Contains user speech converted to text by STT serviceLLMTextFrame
: Contains LLM-generated responsesTTSTextFrame
: Contains bot responses converted to text by TTS service (represents what was actually spoken)LLMTextFrame
s but outputs TTSTextFrame
s, which
represent the actual spoken text returned by the TTS provider. This ensures
context matches what users actually hear.TranscriptionFrame
objects pushed by the STT service, the user aggregator needs to be positioned to collect these frames.
transport.output()
. This positioning is important because:
TTSTextFrame
s in addition to audiotransport.output()
to ensure proper word-level context updates during interruptions.LLMMessagesAppendFrame
: Appends a new message to the existing contextLLMMessagesUpdateFrame
: Completely replaces the existing context with new messagesget_context_frame()
method to obtain the latest context:
LLMMessagesAppendFrame
or LLMMessagesUpdateFrame
TranscriptionFrame
for users, TTSTextFrame
for assistants