VisionImageFrameAggregator

Overview

VisionImageFrameAggregator is a processor that pairs text prompts with images to create vision processing requests. It waits for consecutive text and image frames, combining them into a single vision frame for processing by multimodal models.

Constructor

aggregator = VisionImageFrameAggregator()

The processor maintains internal state to track the most recent text prompt.

Input Frames

Text Prompt

TextFrame

Frame

Contains the text prompt or question about the image

Image Data

InputImageRawFrame

Frame

Contains the image to be analyzed, including: - Raw image data - Image dimensions - Format information

Output Frames

VisionImageRawFrame

Frame

Combined frame containing: - Text prompt - Image data - Image dimensions - Format information

Processing Pattern

The aggregator follows a specific sequence:

Receives TextFrame → stores prompt
Receives InputImageRawFrame → combines with stored prompt
Outputs VisionImageRawFrame
Resets stored prompt

Usage Examples

Basic Usage

# Create aggregator
aggregator = VisionImageFrameAggregator()

# Process frames
await aggregator.process_frame(TextFrame("What do you see?"))
await aggregator.process_frame(InputImageRawFrame(
    image=image_bytes,
    size=(640, 480),
    format="jpeg"
))
# Output: VisionImageRawFrame with combined data

Pipeline Integration

# Vision processing pipeline
pipeline = Pipeline([
    text_input,          # Generates text prompts
    image_input,         # Provides images
    VisionImageFrameAggregator(),
    vision_model,        # Processes combined frames
    response_handler
])

Frame Flow

Example Sequence

# Text prompt
await pipeline.push_frame(TextFrame(
    "Describe the objects in this image"
))

# Image data
await pipeline.push_frame(InputImageRawFrame(
    image=image_data,
    size=(1024, 768),
    format="png"
))

# Results in VisionImageRawFrame output
# Containing both prompt and image data

Notes

Text prompts must precede their corresponding images
Only the most recent text prompt is stored
Unmatched text prompts are replaced by newer ones
Non-matching frames are passed through unchanged
State is automatically reset after output
Thread-safe for pipeline processing

Services

Utilities

Frames

Processors

Pipelines

VisionImageFrameAggregator

Overview

Constructor

Input Frames

Text Prompt

Image Data

Output Frames

Processing Pattern

Usage Examples

Basic Usage

Pipeline Integration

Frame Flow

Example Sequence

Notes

Services

Utilities

Frames

Processors

Pipelines

​Overview

​Constructor

​Input Frames

​Text Prompt

​Image Data

​Output Frames

​Processing Pattern

​Usage Examples

​Basic Usage

​Pipeline Integration

​Frame Flow

​Example Sequence

​Notes

Overview

Constructor

Input Frames

Text Prompt

Image Data

Output Frames

Processing Pattern

Usage Examples

Basic Usage

Pipeline Integration

Frame Flow

Example Sequence

Notes