Overview
PatternPairAggregator is a specialized text aggregator that buffers streaming text until it can identify complete pattern pairs (like XML tags, markdown formatting, or custom delimiters). It processes the content between these patterns using a set of pre-defined actions (remove, keep, or aggregate) and returns text outside those patterns at sentence boundaries. The aggregator supports registering callback functions that are invoked when specific pattern pairs are matched, allowing for custom processing when matches occur. Note: These callbacks do not support modifying the text being aggregated; they are intended for side effects like logging or updating state.
This aggregator is particularly useful for applications like voice switching, structured content processing, and extracting metadata from LLM outputs, ensuring that patterns spanning multiple text chunks are correctly identified or categorizing text based on embedded markers for downstream services and processing to treat different segments appropriately. For example: identifying URL patterns, code blocks, or special formatting in LLM responses that may need special speech handling in the TTS or client-side handling via RTVI.
Constructor
Methods
add_pattern
Unique identifier for this pattern pair that should also represent what the text between the tags represents (e.g., “voice”, “xml”, “credit_card”, etc.). This value will be returned as part of both PatternMatch provided to callbacks and the Aggregation object returned from
aggregate().Pattern that marks the beginning of content
Pattern that marks the end of content
What to do with the matched pattern and its content:
MatchAction.REMOVE: The text along with its delimiters will be removed from the streaming text. Sentence aggregation will continue on as if this text did not exist.MatchAction.KEEP: The delimiters will be removed, but the content between them will be kept. Sentence aggregation will continue on with the internal text included. This is helpful if you want to keep the content but be notified when it occurs via a callback.MatchAction.AGGREGATE: Aggregate the matched pattern and its content as a separate aggregation. The matched content will be returned in anAggregationobject with the specified type when the pattern is completed. When the start of this pattern is detected, any buffered text up to that point will be returned as a standard “sentence” aggregation.
Returns
Self for method chaining
add_pattern_pair
on_pattern_match
The pattern pair type to listen for (as defined in
add_pattern)Function to call when the pattern is matched. The function should accept a
PatternMatch object.
Returns
Self for method chaining
Pattern Match Object
When a pattern is matched, the handler function receives aPatternMatch object which is a subclass of the Aggregation object. It contains the following fields:
The identifier and descriptor of the matched pattern pair. This field is part of the
Aggregation base class.The text content between the start and end patterns. This field is part of the
Aggregation base class.The complete text including start and end patterns.
Usage Examples
Voice Switching in TTS
This example demonstrates finding custom<voice> tags in streaming text to switch voices dynamically in a TTS service like Cartesia. It removes the tags and the content between them, such that the content is treated as if it does not exist. It will not be spoken by the TTS, it will not be added to the context, and it will not be sent to clients via RTVI. Instead, it simply triggers a voice switch side effect.
Extracting Structured Data from LLM Outputs
This example shows how to extract JSON data blocks from LLM outputs, aggregating them separately to be removed from the spoken text, but not from the context or client display.Handling Special Values in LLM Output
This example demonstrates how to identify and process custom tags in LLM output that denote special content, such as credit cards that should be handled differently by downstream services. In this case, the TTS should spell it out, while RTVI should obfuscate the number.How It Works
Notes
- Patterns are processed in the order they appear in the text
- Handlers are called when complete patterns are found
- Patterns can span multiple sentences of text, but be aware that encoding many “reasoning” tokens may slow down the LLM response