DTMFAggregator
Aggregates DTMF (phone keypad) input into meaningful sequences for LLM processing
Overview
DTMFAggregator
processes incoming DTMF (Dual-Tone Multi-Frequency) frames from phone keypad input and aggregates them into complete sequences that can be understood by LLM services. It buffers individual digit presses and flushes them as transcription frames when a termination digit is pressed, a timeout occurs, or an interruption happens.
This aggregator is essential for telephony applications where users interact via phone keypad buttons, converting raw DTMF input into structured text that LLMs can process alongside voice transcriptions.
Constructor
Idle timeout in seconds before flushing the aggregated digits
Digit that triggers immediate flush of the aggregation
Prefix added to DTMF sequence in the output transcription
Input Frames
Contains a single keypad button press with a KeypadEntry value
Flushes any pending aggregation when user interruption begins
Flushes pending aggregation and stops the aggregation task
Output Frames
Contains the aggregated DTMF sequence as text with the configured prefix
All input frames are passed through downstream, including the original InputDTMFFrame
instances.
Keypad Entries
The aggregator processes these standard phone keypad entries:
KeypadEntry | Value | Description |
---|---|---|
ZERO through NINE | "0" - "9" | Numeric digits |
STAR | "*" | Star/asterisk key |
POUND | "#" | Pound/hash key |
Aggregation Behavior
The aggregator flushes (emits a TranscriptionFrame) when:
- Termination digit: The configured termination digit is pressed (default:
#
) - Timeout: No new digits received within the timeout period (default: 2 seconds)
- Interruption: A
StartInterruptionFrame
is received - Pipeline end: An
EndFrame
is received
Usage Examples
Basic Telephony Integration
Custom Configuration for Menu Systems
Sequence Examples
User Input | Aggregation Trigger | Output TranscriptionFrame |
---|---|---|
1 , 2 , 3 , # | Termination digit | "DTMF: 123#" |
* , 0 | 2-second timeout | "DTMF: *0" |
5 , interruption | StartInterruptionFrame | "DTMF: 5" |
9 , 9 , EndFrame | Pipeline shutdown | "DTMF: 99" |
Frame Flow
Error Handling
The aggregator gracefully handles:
- Invalid DTMF digits (logged and ignored)
- Pipeline interruptions (flushes pending sequences)
- Rapid key presses (buffers efficiently)
- Mixed voice and DTMF input (processes independently)
Best Practices
- System Prompt Design: Train your LLM to recognize and respond to DTMF prefixed input
- Timeout Configuration: Use shorter timeouts (1-2s) for rapid entry, longer (3-5s) for menu selection
- Termination Strategy: Use
#
for confirmation,*
for cancel/back operations - Pipeline Placement: Always place before the user context aggregator to ensure proper frame ordering