Data Frames - Pipecat

Overview

DataFrames carry the main content flowing through a pipeline: audio chunks, text, images, transcriptions, and messages. They are queued and processed in order with other DataFrames and ControlFrames, and any pending DataFrames are discarded when a user interrupts. See the Frames overview for base class details, mixin fields, and frame properties common to all frames.

Audio Frames

These frames carry raw audio through the pipeline toward the output transport. Each inherits the audio, sample_rate, num_channels, and num_frames fields from the AudioRawFrame mixin.

OutputAudioRawFrame

A chunk of raw audio destined for the output transport. Use the inherited transport_destination field when your transport supports multiple audio tracks. Inherits from AudioRawFrame.

TTSAudioRawFrame

Audio generated by a TTS service, ready for playback. Inherits from OutputAudioRawFrame.

context_id

Optional[str]

default:"None"

Identifier for the TTS context that generated this audio.

SpeechOutputAudioRawFrame

Audio from a continuous speech stream. The stream may contain silence frames intermixed with speech, so downstream processors may need to distinguish between the two. Inherits from OutputAudioRawFrame.

Image Frames

Frames for carrying image data to the output transport. Each inherits image, size, and format from the ImageRawFrame mixin.

OutputImageRawFrame

An image for display by the output transport. Supports the transport_destination field for transports with multiple video tracks. Inherits from ImageRawFrame.

The sync_with_audio field (default False) is set internally, not via the constructor. When True, the image is queued with audio frames so it displays only after all preceding audio has been sent. When False, the transport displays it immediately.

URLImageRawFrame

An output image with an associated download URL, typically from a third-party image generation service. Inherits from OutputImageRawFrame.

url

Optional[str]

default:"None"

URL where the image can be downloaded.

AssistantImageRawFrame

An image generated by the assistant for both display and inclusion in LLM context. The superclass handles display; the additional fields here carry the original image data in a format suitable for direct use in LLM context messages. Inherits from OutputImageRawFrame.

original_data

Optional[bytes]

default:"None"

Original image data for use in LLM context messages without further encoding.

original_mime_type

Optional[str]

default:"None"

MIME type of the original image data.

SpriteFrame

An animated sprite composed of multiple image frames. The transport plays the images at the framerate specified by the transport’s camera_out_framerate parameter.

images

List[OutputImageRawFrame]

required

Ordered list of image frames that make up the sprite animation.

Text Frames

Text content at various stages of processing: raw text, LLM output, aggregated results, TTS input, and transcriptions.

TextFrame

The fundamental text container. Emitted by LLM services, consumed by context aggregators, TTS services, and other processors.

text

str

required

The text content.

Several non-constructor fields control downstream behavior: - skip_tts (default None): when set, tells the TTS service to skip this text - includes_inter_frame_spaces (default False): indicates whether leading/trailing spaces are already included - append_to_context (default True): whether this text should be appended to the LLM context

LLMTextFrame

Text generated by an LLM service. Behaves like a TextFrame with includes_inter_frame_spaces set to True, since LLM services include all necessary spacing. Inherits from TextFrame.

AggregatedTextFrame

Multiple text frames combined into a single frame for processing or output. Inherits from TextFrame.

aggregated_by

AggregationType | str

required

Method used to aggregate the text frames.

context_id

Optional[str]

default:"None"

Identifier for the TTS context associated with this text.

VisionTextFrame

Text output from a vision model. Functionally identical to LLMTextFrame but distinguished by type for routing purposes. Inherits from LLMTextFrame.

TTSTextFrame

Text that has been sent to a TTS service for synthesis. Inherits from AggregatedTextFrame.

context_id

Optional[str]

default:"None"

Identifier for the TTS context that generated this text.

Transcriptions

Frames produced by speech-to-text services at different stages of recognition. All inherit from TextFrame, so they flow through text aggregators and other TextFrame handlers.

TranscriptionFrame

A non-interim transcription result from an STT service: the service’s best recognition of what the user said, as opposed to the streaming partial results in InterimTranscriptionFrame.

user_id

str

required

Identifier for the user who spoke.

timestamp

str

required

When the transcription occurred.

language

Optional[Language]

default:"None"

Detected or specified language of the speech.

result

Optional[Any]

default:"None"

Raw result object from the STT service.

finalized

bool

default:"False"

Whether the STT service has explicitly committed this transcription via a finalize signal. Some services (AssemblyAI, Deepgram, Soniox, Speechmatics) support this; others don’t, so it defaults to False. Turn detection strategies can use this flag to trigger the bot’s response immediately rather than waiting for a timeout.

InterimTranscriptionFrame

A partial, in-progress transcription. These frames update frequently while the user is still speaking, and are superseded by a TranscriptionFrame once the STT service produces its result.

text

str

required

The partial transcription text.

user_id

str

required

Identifier for the user who spoke.

timestamp

str

required

When the interim transcription occurred.

language

Optional[Language]

default:"None"

Detected or specified language of the speech.

result

Optional[Any]

default:"None"

Raw result object from the STT service.

TranslationFrame

A translated transcription, typically placed in the transport’s receive queue when a participant speaks in a different language.

user_id

str

required

Identifier for the user who spoke.

timestamp

str

required

When the translation occurred.

language

Optional[Language]

default:"None"

Target language of the translation.

TTS Frames

TTSSpeakFrame

Sends text to the pipeline’s TTS service as a standalone utterance, independent of any LLM response turn. The TTS service creates a fresh audio context for each TTSSpeakFrame, whereas TextFrames produced during an LLM response are grouped under the same turn context.

text

str

required

The text to be spoken.

append_to_context

Optional[bool]

default:"None"

Whether to append the spoken text to the LLM context.

Transport Message Frames

OutputTransportMessageFrame

A transport-specific message payload for sending data through the output transport. The message format depends on the transport implementation.

message

Any

required

The transport message payload.

DTMF Frames

OutputDTMFFrame

A DTMF (Dual-Tone Multi-Frequency) keypress queued for output. Inherits the button field from the DTMFFrame mixin, which holds the keypad entry that was pressed. Inherits from DTMFFrame.

button

NewKeypadEntry

required

The DTMF keypad entry to send.

For transports that support multiple dial-out destinations, set the transport_destination field (inherited from Frame) to specify which destination receives the DTMF tone.

LLM Context Management

Frames that modify or trigger processing of the LLM conversation context.

LLMMessagesAppendFrame

Appends messages to the current conversation context without replacing existing ones.

messages

List[dict]

required

List of message dictionaries to append.

run_llm

Optional[bool]

default:"None"

Whether the LLM should process the updated context immediately. When None, the default behavior of the context aggregator applies.

LLMMessagesUpdateFrame

Replaces the current context messages entirely with a new set.

messages

List[dict]

required

List of message dictionaries to replace the current context.

run_llm

Optional[bool]

default:"None"

Whether the LLM should process the updated context immediately. When None, the default behavior of the context aggregator applies.

LLMRunFrame

Triggers LLM processing with the current context. Push this frame when you want the LLM to generate a response using whatever context has already been assembled.

LLMContextAssistantTimestampFrame

Records when an assistant message was created. Used internally to track timing of assistant responses in the conversation context.

timestamp

str

required

Timestamp when the assistant message was created.

LLM Thinking

LLMThoughtTextFrame

A chunk of thought or reasoning text from the LLM. This is a DataFrame, not a TextFrame subclass — TTS services and text aggregators will not process it.

text

str

required

The text (or text chunk) of the thought.

LLM Tool Configuration

Frames for configuring LLM function calling behavior and output settings at runtime.

LLMSetToolsFrame

Sets the available tools for LLM function calling. The format of tool definitions typically follows JSON Schema conventions, though the exact structure depends on the LLM provider.

tools

List[dict] | ToolsSchema | NotGiven

required

List of tool/function definitions for the LLM.

LLMSetToolChoiceFrame

Configures how the LLM selects tools during function calling.

tool_choice

"none" | "auto" | "required" | dict

required

Tool choice setting: "none" disables tool use, "auto" lets the LLM decide, "required" forces a tool call, or a dict specifying a particular tool.

LLMEnablePromptCachingFrame

Toggles prompt caching for LLMs that support it.

enable

bool

required

Whether to enable prompt caching.

LLMConfigureOutputFrame

Configures how the LLM produces output. Useful for scenarios where you want the LLM to generate tokens that update context but should not be spoken aloud.

skip_tts

bool

required

When True, LLM tokens are added to context but not passed to TTS.

Function Call Results

FunctionCallResultFrame

Contains the result of a completed function call execution. Inherits from UninterruptibleFrame to ensure the result always reaches the context aggregator.

function_name

str

required

Name of the function that was executed.

tool_call_id

str

required

Unique identifier for the function call.

arguments

Any

required

Arguments that were passed to the function.

result

Any

required

The result returned by the function.

run_llm

Optional[bool]

default:"None"

Whether to run the LLM after this result. Overrides the default behavior.

properties

Optional[FunctionCallResultProperties]

default:"None"

Additional properties for result handling.

API Reference

Services

Utilities

Frameworks

Frames

Pipeline

​Overview

​Audio Frames

​OutputAudioRawFrame

​TTSAudioRawFrame

​SpeechOutputAudioRawFrame

​Image Frames

​OutputImageRawFrame

​URLImageRawFrame

​AssistantImageRawFrame

​SpriteFrame

​Text Frames

​TextFrame

​LLMTextFrame

​AggregatedTextFrame

​VisionTextFrame

​TTSTextFrame

​Transcriptions

​TranscriptionFrame

​InterimTranscriptionFrame

​TranslationFrame

​TTS Frames

​TTSSpeakFrame

​Transport Message Frames

​OutputTransportMessageFrame

​DTMF Frames

​OutputDTMFFrame

​LLM Context Management

​LLMMessagesAppendFrame

​LLMMessagesUpdateFrame

​LLMRunFrame

​LLMContextAssistantTimestampFrame

​LLM Thinking

​LLMThoughtTextFrame

​LLM Tool Configuration

​LLMSetToolsFrame

​LLMSetToolChoiceFrame

​LLMEnablePromptCachingFrame

​LLMConfigureOutputFrame

​Function Call Results

​FunctionCallResultFrame

Overview

Audio Frames

OutputAudioRawFrame

TTSAudioRawFrame

SpeechOutputAudioRawFrame

Image Frames

OutputImageRawFrame

URLImageRawFrame

AssistantImageRawFrame

SpriteFrame

Text Frames

TextFrame

LLMTextFrame

AggregatedTextFrame

VisionTextFrame

TTSTextFrame

Transcriptions

TranscriptionFrame

InterimTranscriptionFrame

TranslationFrame

TTS Frames

TTSSpeakFrame

Transport Message Frames

OutputTransportMessageFrame

DTMF Frames

OutputDTMFFrame

LLM Context Management

LLMMessagesAppendFrame

LLMMessagesUpdateFrame

LLMRunFrame

LLMContextAssistantTimestampFrame

LLM Thinking

LLMThoughtTextFrame

LLM Tool Configuration

LLMSetToolsFrame

LLMSetToolChoiceFrame

LLMEnablePromptCachingFrame

LLMConfigureOutputFrame

Function Call Results

FunctionCallResultFrame