Overview
DataFrames carry the main content flowing through a pipeline: audio chunks, text, images, transcriptions, and messages. They are queued and processed in order with other DataFrames and ControlFrames, and any pending DataFrames are discarded when a user interrupts. See the Frames overview for base class details, mixin fields, and frame properties common to all frames.Audio Frames
These frames carry raw audio through the pipeline toward the output transport. Each inherits theaudio, sample_rate, num_channels, and num_frames fields from the AudioRawFrame mixin.
OutputAudioRawFrame
A chunk of raw audio destined for the output transport. Use the inheritedtransport_destination field when your transport supports multiple audio tracks.
Inherits from AudioRawFrame.
TTSAudioRawFrame
Audio generated by a TTS service, ready for playback. Inherits fromOutputAudioRawFrame.
Identifier for the TTS context that generated this audio.
SpeechOutputAudioRawFrame
Audio from a continuous speech stream. The stream may contain silence frames intermixed with speech, so downstream processors may need to distinguish between the two. Inherits fromOutputAudioRawFrame.
Image Frames
Frames for carrying image data to the output transport. Each inheritsimage, size, and format from the ImageRawFrame mixin.
OutputImageRawFrame
An image for display by the output transport. Supports thetransport_destination field for transports with multiple video tracks.
Inherits from ImageRawFrame.
The
sync_with_audio field (default False) is set internally, not via the
constructor. When True, the image is queued with audio frames so it displays
only after all preceding audio has been sent. When False, the transport
displays it immediately.URLImageRawFrame
An output image with an associated download URL, typically from a third-party image generation service. Inherits fromOutputImageRawFrame.
URL where the image can be downloaded.
AssistantImageRawFrame
An image generated by the assistant for both display and inclusion in LLM context. The superclass handles display; the additional fields here carry the original image data in a format suitable for direct use in LLM context messages. Inherits fromOutputImageRawFrame.
Original image data for use in LLM context messages without further encoding.
MIME type of the original image data.
SpriteFrame
An animated sprite composed of multiple image frames. The transport plays the images at the framerate specified by the transport’scamera_out_framerate parameter.
Ordered list of image frames that make up the sprite animation.
Text Frames
Text content at various stages of processing: raw text, LLM output, aggregated results, TTS input, and transcriptions.TextFrame
The fundamental text container. Emitted by LLM services, consumed by context aggregators, TTS services, and other processors.The text content.
Several non-constructor fields control downstream behavior: -
skip_tts
(default None): when set, tells the TTS service to skip this text -
includes_inter_frame_spaces (default False): indicates whether
leading/trailing spaces are already included - append_to_context (default
True): whether this text should be appended to the LLM contextLLMTextFrame
Text generated by an LLM service. Behaves like aTextFrame with includes_inter_frame_spaces set to True, since LLM services include all necessary spacing.
Inherits from TextFrame.
AggregatedTextFrame
Multiple text frames combined into a single frame for processing or output. Inherits fromTextFrame.
Method used to aggregate the text frames.
Identifier for the TTS context associated with this text.
VisionTextFrame
Text output from a vision model. Functionally identical toLLMTextFrame but distinguished by type for routing purposes.
Inherits from LLMTextFrame.
TTSTextFrame
Text that has been sent to a TTS service for synthesis. Inherits fromAggregatedTextFrame.
Identifier for the TTS context that generated this text.
Transcriptions
Frames produced by speech-to-text services at different stages of recognition. All inherit fromTextFrame, so they flow through text aggregators and other TextFrame handlers.
TranscriptionFrame
A non-interim transcription result from an STT service: the service’s best recognition of what the user said, as opposed to the streaming partial results inInterimTranscriptionFrame.
Identifier for the user who spoke.
When the transcription occurred.
Detected or specified language of the speech.
Raw result object from the STT service.
Whether the STT service has explicitly committed this transcription via a
finalize signal. Some services (AssemblyAI, Deepgram, Soniox, Speechmatics)
support this; others don’t, so it defaults to
False. Turn detection
strategies can use this flag to trigger the bot’s response immediately rather
than waiting for a timeout.InterimTranscriptionFrame
A partial, in-progress transcription. These frames update frequently while the user is still speaking, and are superseded by aTranscriptionFrame once the STT service produces its result.
The partial transcription text.
Identifier for the user who spoke.
When the interim transcription occurred.
Detected or specified language of the speech.
Raw result object from the STT service.
TranslationFrame
A translated transcription, typically placed in the transport’s receive queue when a participant speaks in a different language.Identifier for the user who spoke.
When the translation occurred.
Target language of the translation.
TTS Frames
TTSSpeakFrame
Sends text to the pipeline’s TTS service as a standalone utterance, independent of any LLM response turn. The TTS service creates a fresh audio context for eachTTSSpeakFrame, whereas TextFrames produced during an LLM response are grouped under the same turn context.
The text to be spoken.
Whether to append the spoken text to the LLM context.
Transport Message Frames
OutputTransportMessageFrame
A transport-specific message payload for sending data through the output transport. The message format depends on the transport implementation.The transport message payload.
DTMF Frames
OutputDTMFFrame
A DTMF (Dual-Tone Multi-Frequency) keypress queued for output. Inherits thebutton field from the DTMFFrame mixin, which holds the keypad entry that was pressed.
Inherits from DTMFFrame.
The DTMF keypad entry to send.
For transports that support multiple dial-out destinations, set the
transport_destination field (inherited from Frame) to specify which
destination receives the DTMF tone.LLM Context Management
Frames that modify or trigger processing of the LLM conversation context.LLMMessagesAppendFrame
Appends messages to the current conversation context without replacing existing ones.List of message dictionaries to append.
Whether the LLM should process the updated context immediately. When
None,
the default behavior of the context aggregator applies.LLMMessagesUpdateFrame
Replaces the current context messages entirely with a new set.List of message dictionaries to replace the current context.
Whether the LLM should process the updated context immediately. When
None,
the default behavior of the context aggregator applies.LLMRunFrame
Triggers LLM processing with the current context. Push this frame when you want the LLM to generate a response using whatever context has already been assembled.LLMContextAssistantTimestampFrame
Records when an assistant message was created. Used internally to track timing of assistant responses in the conversation context.Timestamp when the assistant message was created.
LLM Thinking
LLMThoughtTextFrame
A chunk of thought or reasoning text from the LLM. This is aDataFrame, not a TextFrame subclass — TTS services and text aggregators will not process it.
The text (or text chunk) of the thought.
LLM Tool Configuration
Frames for configuring LLM function calling behavior and output settings at runtime.LLMSetToolsFrame
Sets the available tools for LLM function calling. The format of tool definitions typically follows JSON Schema conventions, though the exact structure depends on the LLM provider.List of tool/function definitions for the LLM.
LLMSetToolChoiceFrame
Configures how the LLM selects tools during function calling.Tool choice setting:
"none" disables tool use, "auto" lets the LLM decide,
"required" forces a tool call, or a dict specifying a particular tool.LLMEnablePromptCachingFrame
Toggles prompt caching for LLMs that support it.Whether to enable prompt caching.
LLMConfigureOutputFrame
Configures how the LLM produces output. Useful for scenarios where you want the LLM to generate tokens that update context but should not be spoken aloud.When
True, LLM tokens are added to context but not passed to TTS.Function Call Results
FunctionCallResultFrame
Contains the result of a completed function call execution. Inherits fromUninterruptibleFrame to ensure the result always reaches the context aggregator.
Name of the function that was executed.
Unique identifier for the function call.
Arguments that were passed to the function.
The result returned by the function.
Whether to run the LLM after this result. Overrides the default behavior.
Additional properties for result handling.