Overview
This document provides technical implementation details for the VoicemailDetector, including its parallel pipeline architecture and performance considerations for production use.API Reference
Complete API documentation for all VoicemailDetector classes and methods
Example Implementation
Working example showing VoicemailDetector integration in a complete pipeline
Architecture
The VoicemailDetector uses a parallel pipeline architecture with two processing branches:- Conversation Branch: Handles normal conversation flow and can be blocked when voicemail is detected to prevent the main LLM from processing additional input.
- Classification Branch: Contains the LLM classifier and decision logic. This branch closes after making a classification decision to prevent unnecessary LLM calls.
Performance Considerations
LLM Selection
The VoicemailDetector works with two separate LLMs in your pipeline: Conversation LLM (main pipeline):- Must be text-based for compatibility with the VoicemailDetector’s gating system
- Realtime LLMs are not compatible as they don’t allow the required level of output control
- This is the LLM that handles normal conversation after classification
- Can be either text-based or realtime LLMs
- For realtime LLMs: Set output modality to
text
to ensure text-based responses - Must be able to output “CONVERSATION” or “VOICEMAIL” keywords for classification
- Recommended models: OpenAILLMService with
gpt-4o
, GoogleLLMService withgemini-2.0-flash
Response Timing
Thevoicemail_response_delay
parameter should be tuned based on your target voicemail systems. The default value of 2 seconds is a good starting point, but you may need to adjust it based on your target voicemail systems.
Common Issues
Classification not working:- Confirm both
detector()
andgate()
are correctly placed in pipeline - Check that STT service is producing text input for classification
- Ensure the LLM is a text-based LLM
- Adjust
voicemail_response_delay
based on observed voicemail greeting patterns - Monitor classification speed; slow LLM responses affect conversation latency
- Ensure
gate()
is placed immediately after TTS service - Verify TTS frames are being generated before classification completes
CLASSIFIER_RESPONSE_INSTRUCTION
constant to ensure proper functionality.