Interruption Strategies
Configure when users can interrupt the bot to prevent unwanted interruptions from brief affirmations
Overview
Interruption strategies allow you to control when users can interrupt the bot during speech. By default, any user speech immediately interrupts the bot, but this can be problematic when users engage in backchanneling—brief vocal responses like “yeah”, “okay”, or “mm-hmm” that indicate they’re listening without intending to interrupt.
With interruption strategies, you can require users to meet specific criteria (such as speaking a minimum number of words or reaching a certain audio volume) before their speech will interrupt the bot, creating a more natural conversation flow.
Want to try it out? Check out the interruption strategies foundational demo
Configuration
Interruption strategies are configured via the interruption_strategies
parameter in PipelineParams
. When specified, the normal immediate interruption behavior is replaced with conditional interruption based on your criteria.
List of interruption strategies to apply. When multiple strategies are provided, the first one that evaluates to true will trigger the interruption. If empty, normal interruption behavior applies.
Base Strategy Interface
All interruption strategies inherit from BaseInterruptionStrategy
, which provides a common interface for evaluating interruption conditions.
Appends audio data to the strategy for analysis. Not all strategies handle audio.
Appends text to the strategy for analysis. Not all strategies handle text.
Called when the user stops speaking to determine if interruption should occur based on accumulated audio and/or text.
Resets accumulated text and/or audio data.
Available Strategies
MinWordsInterruptionStrategy
Requires users to speak a minimum number of words before interrupting the bot.
Minimum number of words the user must speak to interrupt the bot. Must be greater than 0.
How It Works
When interruption strategies are configured:
- Bot not speaking: User speech interrupts immediately (normal behavior)
- Bot speaking: User speech and audio are collected and fed to strategies
- User stops speaking: Strategies are evaluated in order
- First match wins: The first strategy that returns
True
triggers interruption - No matches: User speech is discarded
The system automatically handles both audio and text input:
- Audio frames (
InputAudioRawFrame
) are fed toappend_audio()
- Transcription text is fed to
append_text()
- Strategies can use either or both data types
Usage Examples
Basic Word Count Interruption
Require users to speak at least 3 words to interrupt the bot:
Multiple Strategies with Priority
Strategies are evaluated in order, with the first match triggering interruption:
Behavior Comparison
Scenario | Without Strategy | With MinWordsInterruptionStrategy(min_words=3) |
---|---|---|
User says “okay” while bot speaks | ✅ Interrupts immediately | ❌ Ignored (only 1 word) |
User says “yes that’s right” while bot speaks | ✅ Interrupts immediately | ✅ Interrupts (3 words) |
User speaks while bot is silent | ✅ Processed immediately | ✅ Processed immediately |
Notes
- Interruption strategies only affect behavior when the bot is actively speaking
- When the bot is not speaking, user input is processed immediately regardless of strategy configuration
- The
allow_interruptions
parameter must beTrue
for interruption strategies to work - User speech that doesn’t meet interruption criteria is discarded, not queued
- Strategies are evaluated in order - first match wins
- Both audio and text data are automatically fed to strategies based on their implementation
- Word counting uses simple whitespace splitting for word boundaries