Skip to main content

Overview

DataFrames carry the main content flowing through a pipeline: audio chunks, text, images, transcriptions, and messages. They are queued and processed in order with other DataFrames and ControlFrames, and any pending DataFrames are discarded when a user interrupts. See the Frames overview for base class details, mixin fields, and frame properties common to all frames.

Audio Frames

These frames carry raw audio through the pipeline toward the output transport. Each inherits the audio, sample_rate, num_channels, and num_frames fields from the AudioRawFrame mixin.

OutputAudioRawFrame

A chunk of raw audio destined for the output transport. Use the inherited transport_destination field when your transport supports multiple audio tracks. Inherits from AudioRawFrame.

TTSAudioRawFrame

Audio generated by a TTS service, ready for playback. Inherits from OutputAudioRawFrame.
context_id
Optional[str]
default:"None"
Identifier for the TTS context that generated this audio.

SpeechOutputAudioRawFrame

Audio from a continuous speech stream. The stream may contain silence frames intermixed with speech, so downstream processors may need to distinguish between the two. Inherits from OutputAudioRawFrame.

Image Frames

Frames for carrying image data to the output transport. Each inherits image, size, and format from the ImageRawFrame mixin.

OutputImageRawFrame

An image for display by the output transport. Supports the transport_destination field for transports with multiple video tracks. Inherits from ImageRawFrame.
The sync_with_audio field (default False) is set internally, not via the constructor. When True, the image is queued with audio frames so it displays only after all preceding audio has been sent. When False, the transport displays it immediately.

URLImageRawFrame

An output image with an associated download URL, typically from a third-party image generation service. Inherits from OutputImageRawFrame.
url
Optional[str]
default:"None"
URL where the image can be downloaded.

AssistantImageRawFrame

An image generated by the assistant for both display and inclusion in LLM context. The superclass handles display; the additional fields here carry the original image data in a format suitable for direct use in LLM context messages. Inherits from OutputImageRawFrame.
original_data
Optional[bytes]
default:"None"
Original image data for use in LLM context messages without further encoding.
original_mime_type
Optional[str]
default:"None"
MIME type of the original image data.

SpriteFrame

An animated sprite composed of multiple image frames. The transport plays the images at the framerate specified by the transport’s camera_out_framerate parameter.
images
List[OutputImageRawFrame]
required
Ordered list of image frames that make up the sprite animation.

Text Frames

Text content at various stages of processing: raw text, LLM output, aggregated results, TTS input, and transcriptions.

TextFrame

The fundamental text container. Emitted by LLM services, consumed by context aggregators, TTS services, and other processors.
text
str
required
The text content.
Several non-constructor fields control downstream behavior: - skip_tts (default None): when set, tells the TTS service to skip this text - includes_inter_frame_spaces (default False): indicates whether leading/trailing spaces are already included - append_to_context (default True): whether this text should be appended to the LLM context

LLMTextFrame

Text generated by an LLM service. Behaves like a TextFrame with includes_inter_frame_spaces set to True, since LLM services include all necessary spacing. Inherits from TextFrame.

AggregatedTextFrame

Multiple text frames combined into a single frame for processing or output. Inherits from TextFrame.
aggregated_by
AggregationType | str
required
Method used to aggregate the text frames.
context_id
Optional[str]
default:"None"
Identifier for the TTS context associated with this text.

VisionTextFrame

Text output from a vision model. Functionally identical to LLMTextFrame but distinguished by type for routing purposes. Inherits from LLMTextFrame.

TTSTextFrame

Text that has been sent to a TTS service for synthesis. Inherits from AggregatedTextFrame.
context_id
Optional[str]
default:"None"
Identifier for the TTS context that generated this text.

Transcriptions

Frames produced by speech-to-text services at different stages of recognition. All inherit from TextFrame, so they flow through text aggregators and other TextFrame handlers.

TranscriptionFrame

A non-interim transcription result from an STT service: the service’s best recognition of what the user said, as opposed to the streaming partial results in InterimTranscriptionFrame.
user_id
str
required
Identifier for the user who spoke.
timestamp
str
required
When the transcription occurred.
language
Optional[Language]
default:"None"
Detected or specified language of the speech.
result
Optional[Any]
default:"None"
Raw result object from the STT service.
finalized
bool
default:"False"
Whether the STT service has explicitly committed this transcription via a finalize signal. Some services (AssemblyAI, Deepgram, Soniox, Speechmatics) support this; others don’t, so it defaults to False. Turn detection strategies can use this flag to trigger the bot’s response immediately rather than waiting for a timeout.

InterimTranscriptionFrame

A partial, in-progress transcription. These frames update frequently while the user is still speaking, and are superseded by a TranscriptionFrame once the STT service produces its result.
text
str
required
The partial transcription text.
user_id
str
required
Identifier for the user who spoke.
timestamp
str
required
When the interim transcription occurred.
language
Optional[Language]
default:"None"
Detected or specified language of the speech.
result
Optional[Any]
default:"None"
Raw result object from the STT service.

TranslationFrame

A translated transcription, typically placed in the transport’s receive queue when a participant speaks in a different language.
user_id
str
required
Identifier for the user who spoke.
timestamp
str
required
When the translation occurred.
language
Optional[Language]
default:"None"
Target language of the translation.

TTS Frames

TTSSpeakFrame

Sends text to the pipeline’s TTS service as a standalone utterance, independent of any LLM response turn. The TTS service creates a fresh audio context for each TTSSpeakFrame, whereas TextFrames produced during an LLM response are grouped under the same turn context.
text
str
required
The text to be spoken.
append_to_context
Optional[bool]
default:"None"
Whether to append the spoken text to the LLM context.

Transport Message Frames

OutputTransportMessageFrame

A transport-specific message payload for sending data through the output transport. The message format depends on the transport implementation.
message
Any
required
The transport message payload.

DTMF Frames

OutputDTMFFrame

A DTMF (Dual-Tone Multi-Frequency) keypress queued for output. Inherits the button field from the DTMFFrame mixin, which holds the keypad entry that was pressed. Inherits from DTMFFrame.
button
NewKeypadEntry
required
The DTMF keypad entry to send.
For transports that support multiple dial-out destinations, set the transport_destination field (inherited from Frame) to specify which destination receives the DTMF tone.

LLM Context Management

Frames that modify or trigger processing of the LLM conversation context.

LLMMessagesAppendFrame

Appends messages to the current conversation context without replacing existing ones.
messages
List[dict]
required
List of message dictionaries to append.
run_llm
Optional[bool]
default:"None"
Whether the LLM should process the updated context immediately. When None, the default behavior of the context aggregator applies.

LLMMessagesUpdateFrame

Replaces the current context messages entirely with a new set.
messages
List[dict]
required
List of message dictionaries to replace the current context.
run_llm
Optional[bool]
default:"None"
Whether the LLM should process the updated context immediately. When None, the default behavior of the context aggregator applies.

LLMRunFrame

Triggers LLM processing with the current context. Push this frame when you want the LLM to generate a response using whatever context has already been assembled.

LLMContextAssistantTimestampFrame

Records when an assistant message was created. Used internally to track timing of assistant responses in the conversation context.
timestamp
str
required
Timestamp when the assistant message was created.

LLM Thinking

LLMThoughtTextFrame

A chunk of thought or reasoning text from the LLM. This is a DataFrame, not a TextFrame subclass — TTS services and text aggregators will not process it.
text
str
required
The text (or text chunk) of the thought.

LLM Tool Configuration

Frames for configuring LLM function calling behavior and output settings at runtime.

LLMSetToolsFrame

Sets the available tools for LLM function calling. The format of tool definitions typically follows JSON Schema conventions, though the exact structure depends on the LLM provider.
tools
List[dict] | ToolsSchema | NotGiven
required
List of tool/function definitions for the LLM.

LLMSetToolChoiceFrame

Configures how the LLM selects tools during function calling.
tool_choice
"none" | "auto" | "required" | dict
required
Tool choice setting: "none" disables tool use, "auto" lets the LLM decide, "required" forces a tool call, or a dict specifying a particular tool.

LLMEnablePromptCachingFrame

Toggles prompt caching for LLMs that support it.
enable
bool
required
Whether to enable prompt caching.

LLMConfigureOutputFrame

Configures how the LLM produces output. Useful for scenarios where you want the LLM to generate tokens that update context but should not be spoken aloud.
skip_tts
bool
required
When True, LLM tokens are added to context but not passed to TTS.

Function Call Results

FunctionCallResultFrame

Contains the result of a completed function call execution. Inherits from UninterruptibleFrame to ensure the result always reaches the context aggregator.
function_name
str
required
Name of the function that was executed.
tool_call_id
str
required
Unique identifier for the function call.
arguments
Any
required
Arguments that were passed to the function.
result
Any
required
The result returned by the function.
run_llm
Optional[bool]
default:"None"
Whether to run the LLM after this result. Overrides the default behavior.
properties
Optional[FunctionCallResultProperties]
default:"None"
Additional properties for result handling.