RTVI Frame Processors convert Pipecat frames into standardized RTVI messages that clients can understand. Each processor handles a specific aspect of the conversation.

Speaking State

The RTVISpeakingProcessor manages speaking state changes for both users and bots:

from pipecat.processors.rtvi import RTVISpeakingProcessor

speaking = RTVISpeakingProcessor()

Converts these frames into RTVI messages:

  • UserStartedSpeakingFrameuser-started-speaking
  • UserStoppedSpeakingFrameuser-stopped-speaking
  • BotStartedSpeakingFramebot-started-speaking
  • BotStoppedSpeakingFramebot-stopped-speaking

Common uses:

  • Turn-taking management
  • Interruption detection
  • UI state updates
  • Speech coordination

User Transcription

The RTVIUserTranscriptionProcessor handles real-time transcriptions of user speech:

from pipecat.processors.rtvi import RTVIUserTranscriptionProcessor

transcription = RTVIUserTranscriptionProcessor()

Converts these frames into RTVI messages:

  • InterimTranscriptionFrameuser-transcription (final=false)
  • TranscriptionFrameuser-transcription (final=true)

Message data includes:

  • Transcribed text
  • User ID
  • Timestamp
  • Final/interim status

Bot Transcription

The RTVIBotTranscriptionProcessor manages bot speech transcriptions:

from pipecat.processors.rtvi import RTVIBotTranscriptionProcessor

bot_transcription = RTVIBotTranscriptionProcessor()

Features:

  • Aggregates text until end of sentence
  • Handles partial utterances
  • Manages text buffering
  • Sends complete sentences

Converts these frames:

  • TextFramebot-transcription (after aggregation)
  • Clears buffer on UserStartedSpeakingFrame

Bot LLM Processing

The RTVIBotLLMProcessor handles language model responses:

from pipecat.processors.rtvi import RTVIBotLLMProcessor

llm = RTVIBotLLMProcessor()

Converts these frames into RTVI messages:

  • LLMFullResponseStartFramebot-llm-started
  • TextFramebot-llm-text
  • LLMFullResponseEndFramebot-llm-stopped

Use cases:

  • Streaming responses
  • Progress indication
  • Response timing
  • Client UI updates

User LLM Text Processing

The RTVIUserLLMTextProcessor manages user messages in the LLM context:

from pipecat.processors.rtvi import RTVIUserLLMTextProcessor

user_llm = RTVIUserLLMTextProcessor()

Handles:

  • Context frame processing
  • User message extraction
  • Text content parsing
  • Message formatting

Converts these frames:

  • OpenAILLMContextFrameuser-llm-text (for user messages)

Bot TTS Processing

The RTVIBotTTSProcessor manages text-to-speech processing:

from pipecat.processors.rtvi import RTVIBotTTSProcessor

tts = RTVIBotTTSProcessor()

Converts these frames into RTVI messages:

  • TTSStartedFramebot-tts-started
  • TTSStoppedFramebot-tts-stopped
  • TextFramebot-tts-text

Features:

  • Speech synthesis state
  • Text synchronization
  • Audio timing
  • Client coordination

Metrics Processing

The RTVIMetricsProcessor collects and reports performance metrics:

from pipecat.processors.rtvi import RTVIMetricsProcessor

metrics = RTVIMetricsProcessor()

Processes these metric types:

  • Time to First Byte (TTFB)
  • Processing times
  • LLM token usage
  • TTS character counts

Converts MetricsFrame data into structured metrics messages for:

  • Performance monitoring
  • Usage tracking
  • Optimization
  • Billing

Pipeline Setup

Here’s how to combine these processors in a pipeline:

# Create processors
rtvi_speaking = RTVISpeakingProcessor()
rtvi_user_transcription = RTVIUserTranscriptionProcessor()
rtvi_bot_transcription = RTVIBotTranscriptionProcessor()
rtvi_bot_llm = RTVIBotLLMProcessor()
rtvi_bot_tts = RTVIBotTTSProcessor()
rtvi_metrics = RTVIMetricsProcessor()

# Build pipeline
processors = [
    transport.input(),
    rtvi_speaking,
    stt,                      # Speech-to-text
    rtvi_user_transcription,  # User transcription
    user_aggregator,          # User context aggregation
    llm,                      # Language model
    rtvi_bot_llm,             # Bot LLM processing
    rtvi_bot_transcription,   # Bot transcription
    tts,                      # Text-to-speech
    rtvi_bot_tts,             # Bot TTS processing
    rtvi_metrics,             # Metrics collection
    transport.output()
]

pipeline = Pipeline(processors)

The order of processors in the pipeline matters! Each processor receives frames from the previous processor and passes them to the next.

Next Steps