VoicemailDetector

Overview

This document provides technical implementation details for the VoicemailDetector, including its parallel pipeline architecture and performance considerations for production use.

API Reference

Complete API documentation for all VoicemailDetector classes and methods

Example Implementation

Working example showing VoicemailDetector integration in a complete pipeline

Architecture

The VoicemailDetector uses a parallel pipeline architecture with two processing branches:

Conversation Branch: Handles normal conversation flow and can be blocked when voicemail is detected to prevent the main LLM from processing additional input.
Classification Branch: Contains the LLM classifier and decision logic. This branch closes after making a classification decision to prevent unnecessary LLM calls.

The system coordinates between branches using a notification system and gates that control frame flow based on classification decisions. TTS frames are buffered during classification and either released (for conversations) or cleared (for voicemail) based on the decision.

Performance Considerations

LLM Selection

The VoicemailDetector works with two separate LLMs in your pipeline: Conversation LLM (main pipeline):

Must be text-based for compatibility with the VoicemailDetector’s gating system
Realtime LLMs are not compatible as they don’t allow the required level of output control
This is the LLM that handles normal conversation after classification

Classifier LLM (VoicemailDetector parameter):

Can be either text-based or realtime LLMs
For realtime LLMs: Set output modality to text to ensure text-based responses
Must be able to output “CONVERSATION” or “VOICEMAIL” keywords for classification
Recommended models: OpenAILLMService with gpt-4o, GoogleLLMService with gemini-2.0-flash

The two LLMs operate independently. You can use a realtime LLM for classification (with text output) while using a text-based LLM for conversation, or use text-based LLMs for both.

Response Timing

The voicemail_response_delay parameter should be tuned based on your target voicemail systems. The default value of 2 seconds is a good starting point, but you may need to adjust it based on your target voicemail systems.

Common Issues

Classification not working:

Confirm both detector() and gate() are correctly placed in pipeline
Check that STT service is producing text input for classification
Ensure the LLM is a text-based LLM

Timing problems:

Adjust voicemail_response_delay based on observed voicemail greeting patterns
Monitor classification speed; slow LLM responses affect conversation latency

Audio playback issues:

Ensure gate() is placed immediately after TTS service
Verify TTS frames are being generated before classification completes

Custom prompt validation: The system validates custom prompts and warns if they’re missing required response keywords (“CONVERSATION” and “VOICEMAIL”). Include the CLASSIFIER_RESPONSE_INSTRUCTION constant to ensure proper functionality.

API Reference

Services

Utilities

Frameworks

Pipeline

VoicemailDetector

Overview

API Reference

Example Implementation

Architecture

Performance Considerations

LLM Selection

Response Timing

Common Issues

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

API Reference

Example Implementation

​Architecture

​Performance Considerations

​LLM Selection

​Response Timing

​Common Issues

Overview

Architecture

Performance Considerations

LLM Selection

Response Timing

Common Issues