RTVI (Real-Time Voice Interaction)

Pipecat’s RTVI (Real-Time Voice Interaction) protocol provides a standardized communication layer between clients and servers for building real-time voice and multimodal applications. It handles the synchronization of user and bot interactions, transcriptions, LLM processing, and text-to-speech delivery.

Speaking States

Track when users and bots start/stop speaking for natural turn-taking

Transcription

Handle real-time transcriptions from both users and bots

LLM Processing

Manage LLM responses and function calls with proper client notifications

TTS Management

Control text-to-speech state and audio delivery

Architecture

RTVI operates with two primary components:

RTVIProcessor - A frame processor residing in the pipeline that serves as the entry point for sending and receiving messages to/from the client.
RTVIObserver - An observer that monitors pipeline events and translates them into client-compatible messages, handling:
- Speaking state changes
- Transcription updates
- LLM responses
- TTS events
- Performance metrics

Basic Example

Here’s how to set up RTVI in your Pipecat application:

from pipecat.processors.frameworks.rtvi import RTVIConfig, RTVIObserver, RTVIProcessor

# Create the RTVI processor
rtvi = RTVIProcessor(config=RTVIConfig(config=[]))

# Include the RTVIProcessor in your pipeline
pipeline = Pipeline(
    [
        transport.input(),
        rtvi,
        stt,
        context_aggregator.user(),
        llm,
        tts,
        transport.output(),
        context_aggregator.assistant(),
    ]
)

# Add the RTVIObserver to your pipeline task
task = PipelineTask(
    pipeline,
    params=PipelineParams(
        allow_interruptions=True,
        enable_metrics=True,
    ),
    observers=[RTVIObserver(rtvi)],
)

# Handle client connection
@rtvi.event_handler("on_client_ready")
async def on_client_ready(rtvi):
    # Signal bot is ready to receive messages
    await rtvi.set_bot_ready()
    # Initialize the conversation
    await task.queue_frames([context_aggregator.user().get_context_frame()])

# Handle participant disconnection
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
    await task.cancel()

# Run the pipeline
runner = PipelineRunner()
await runner.run(task)

Protocol Flow

Client connects and sends a client-ready message
Server responds with bot-ready and initial configuration
Client and server exchange real-time events:
- Speaking state changes (user/bot-started/stopped-speaking)
- Transcriptions (user/bot-transcription)
- LLM processing (bot-llm-text, llm-function-call)
- TTS events (bot-tts-text, bot-tts-audio)

Key Components

RTVIProcessor

Configure and manage RTVI services, actions, and client communication

RTVIObserver

Translate internal pipeline events to standardized client messages

Client Integration

RTVI is implemented in Pipecat client SDKs, providing a high-level API to interact with the protocol. Visit the Pipecat Client SDKs documentation:

Client SDKs

Learn how to implement RTVI on the client-side with our JavaScript, React, and mobile SDKs

API Reference

Services

Utilities

Frameworks

Pipeline

RTVI (Real-Time Voice Interaction)

Speaking States

Transcription

LLM Processing

TTS Management

Architecture

Basic Example

Protocol Flow

Key Components

RTVIProcessor

RTVIObserver

Client Integration

Client SDKs

API Reference

Services

Utilities

Frameworks

Pipeline

Speaking States

Transcription

LLM Processing

TTS Management

​Architecture

​Basic Example

​Protocol Flow

​Key Components

RTVIProcessor

RTVIObserver

​Client Integration

Client SDKs

Architecture

Basic Example

Protocol Flow

Key Components

Client Integration