Overview

This document provides technical implementation details for the VoicemailDetector, including its parallel pipeline architecture and performance considerations for production use.

Architecture

The VoicemailDetector uses a parallel pipeline architecture with two processing branches:
  • Conversation Branch: Handles normal conversation flow and can be blocked when voicemail is detected to prevent the main LLM from processing additional input.
  • Classification Branch: Contains the LLM classifier and decision logic. This branch closes after making a classification decision to prevent unnecessary LLM calls.
The system coordinates between branches using a notification system and gates that control frame flow based on classification decisions. TTS frames are buffered during classification and either released (for conversations) or cleared (for voicemail) based on the decision.

Performance Considerations

LLM Selection

The VoicemailDetector works with two separate LLMs in your pipeline: Conversation LLM (main pipeline):
  • Must be text-based for compatibility with the VoicemailDetector’s gating system
  • Realtime LLMs are not compatible as they don’t allow the required level of output control
  • This is the LLM that handles normal conversation after classification
Classifier LLM (VoicemailDetector parameter):
  • Can be either text-based or realtime LLMs
  • For realtime LLMs: Set output modality to text to ensure text-based responses
  • Must be able to output “CONVERSATION” or “VOICEMAIL” keywords for classification
  • Recommended models: OpenAILLMService with gpt-4o, GoogleLLMService with gemini-2.0-flash
The two LLMs operate independently. You can use a realtime LLM for classification (with text output) while using a text-based LLM for conversation, or use text-based LLMs for both.

Response Timing

The voicemail_response_delay parameter should be tuned based on your target voicemail systems. The default value of 2 seconds is a good starting point, but you may need to adjust it based on your target voicemail systems.

Common Issues

Classification not working:
  • Confirm both detector() and gate() are correctly placed in pipeline
  • Check that STT service is producing text input for classification
  • Ensure the LLM is a text-based LLM
Timing problems:
  • Adjust voicemail_response_delay based on observed voicemail greeting patterns
  • Monitor classification speed; slow LLM responses affect conversation latency
Audio playback issues:
  • Ensure gate() is placed immediately after TTS service
  • Verify TTS frames are being generated before classification completes
Custom prompt validation: The system validates custom prompts and warns if they’re missing required response keywords (“CONVERSATION” and “VOICEMAIL”). Include the CLASSIFIER_RESPONSE_INSTRUCTION constant to ensure proper functionality.