RTVI Frame Processors
Specialized processors for handling different aspects of RTVI communication
RTVI Frame Processors convert Pipecat frames into standardized RTVI messages that clients can understand. Each processor handles a specific aspect of the conversation.
Speaking State
The RTVISpeakingProcessor
manages speaking state changes for both users and bots:
Converts these frames into RTVI messages:
UserStartedSpeakingFrame
→user-started-speaking
UserStoppedSpeakingFrame
→user-stopped-speaking
BotStartedSpeakingFrame
→bot-started-speaking
BotStoppedSpeakingFrame
→bot-stopped-speaking
Common uses:
- Turn-taking management
- Interruption detection
- UI state updates
- Speech coordination
User Transcription
The RTVIUserTranscriptionProcessor
handles real-time transcriptions of user speech:
Converts these frames into RTVI messages:
InterimTranscriptionFrame
→user-transcription
(final=false)TranscriptionFrame
→user-transcription
(final=true)
Message data includes:
- Transcribed text
- User ID
- Timestamp
- Final/interim status
Bot Transcription
The RTVIBotTranscriptionProcessor
manages bot speech transcriptions:
Features:
- Aggregates text until end of sentence
- Handles partial utterances
- Manages text buffering
- Sends complete sentences
Converts these frames:
TextFrame
→bot-transcription
(after aggregation)- Clears buffer on
UserStartedSpeakingFrame
Bot LLM Processing
The RTVIBotLLMProcessor
handles language model responses:
Converts these frames into RTVI messages:
LLMFullResponseStartFrame
→bot-llm-started
TextFrame
→bot-llm-text
LLMFullResponseEndFrame
→bot-llm-stopped
Use cases:
- Streaming responses
- Progress indication
- Response timing
- Client UI updates
User LLM Text Processing
The RTVIUserLLMTextProcessor
manages user messages in the LLM context:
Handles:
- Context frame processing
- User message extraction
- Text content parsing
- Message formatting
Converts these frames:
OpenAILLMContextFrame
→user-llm-text
(for user messages)
Bot TTS Processing
The RTVIBotTTSProcessor
manages text-to-speech processing:
Converts these frames into RTVI messages:
TTSStartedFrame
→bot-tts-started
TTSStoppedFrame
→bot-tts-stopped
TextFrame
→bot-tts-text
Features:
- Speech synthesis state
- Text synchronization
- Audio timing
- Client coordination
Metrics Processing
The RTVIMetricsProcessor
collects and reports performance metrics:
Processes these metric types:
- Time to First Byte (TTFB)
- Processing times
- LLM token usage
- TTS character counts
Converts MetricsFrame
data into structured metrics messages for:
- Performance monitoring
- Usage tracking
- Optimization
- Billing
Pipeline Setup
Here’s how to combine these processors in a pipeline:
The order of processors in the pipeline matters! Each processor receives frames from the previous processor and passes them to the next.