Pipecat’s RTVI (Real-Time Voice Interaction) protocol provides a standardized communication layer between clients and servers for building real-time voice and multimodal applications. It handles the synchronization of user and bot interactions, transcriptions, LLM processing, and text-to-speech delivery.

Speaking States

Track when users and bots start/stop speaking for natural turn-taking

Transcription

Handle real-time transcriptions from both users and bots

LLM Processing

Manage LLM responses and function calls with proper client notifications

TTS Management

Control text-to-speech state and audio delivery

Architecture

RTVI operates with two primary components:

  1. RTVIProcessor - A frame processor residing in the pipeline that serves as the entry point for sending and receiving messages to/from the client.

  2. RTVIObserver - An observer that monitors pipeline events and translates them into client-compatible messages, handling:

    • Speaking state changes
    • Transcription updates
    • LLM responses
    • TTS events
    • Performance metrics

Basic Example

Here’s how to set up RTVI in your Pipecat application:

from pipecat.processors.frameworks.rtvi import RTVIConfig, RTVIObserver, RTVIProcessor

# Create the RTVI processor
rtvi = RTVIProcessor(config=RTVIConfig(config=[]))

# Include the RTVIProcessor in your pipeline
pipeline = Pipeline(
    [
        transport.input(),
        rtvi,
        stt,
        context_aggregator.user(),
        llm,
        tts,
        transport.output(),
        context_aggregator.assistant(),
    ]
)

# Add the RTVIObserver to your pipeline task
task = PipelineTask(
    pipeline,
    params=PipelineParams(
        allow_interruptions=True,
        enable_metrics=True,
    ),
    observers=[RTVIObserver(rtvi)],
)

# Handle client connection
@rtvi.event_handler("on_client_ready")
async def on_client_ready(rtvi):
    # Signal bot is ready to receive messages
    await rtvi.set_bot_ready()
    # Initialize the conversation
    await task.queue_frames([context_aggregator.user().get_context_frame()])

# Handle participant disconnection
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
    await task.cancel()

# Run the pipeline
runner = PipelineRunner()
await runner.run(task)

Protocol Flow

  1. Client connects and sends a client-ready message
  2. Server responds with bot-ready and initial configuration
  3. Client and server exchange real-time events:
    • Speaking state changes (user/bot-started/stopped-speaking)
    • Transcriptions (user/bot-transcription)
    • LLM processing (bot-llm-text, llm-function-call)
    • TTS events (bot-tts-text, bot-tts-audio)

Key Components

Client Integration

RTVI is implemented in Pipecat client SDKs, providing a high-level API to interact with the protocol. Visit the Pipecat Client SDKs documentation:

Client SDKs

Learn how to implement RTVI on the client-side with our JavaScript, React, and mobile SDKs