> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# The RTVI Standard

> An overview of the RTVI standard and its key features

The RTVI (Real-Time Voice and Video Inference) standard defines a set of message types and structures sent between clients and servers. It is designed to facilitate real-time interactions between clients and AI applications that require voice, video, and text communication. It provides a consistent framework for building applications that can communicate with AI models and the backends running those models in real-time.

<Note>
  This page documents version 1.0 of the RTVI standard, released in June 2025.
</Note>

## Key Features

<CardGroup cols={2}>
  <Card title="Connection Management" icon="plug">
    RTVI provides a flexible connection model that allows clients to connect to
    AI services and coordinate state.
  </Card>

  <Card title="Transcriptions" icon="text">
    The standard includes built-in support for real-time transcription of audio
    streams.
  </Card>

  <Card title="Client-Server Messaging" icon="comments">
    The standard defines a messaging protocol for sending and receiving messages
    between clients and servers, allowing for efficient communication of
    requests and responses.
  </Card>

  <Card title="Advanced LLM Interactions" icon="robot">
    The standard supports advanced interactions with large language models
    (LLMs), including context management, function call handline, and search
    results.
  </Card>

  <Card title="Service-Specific Insights" icon="lightbulb">
    RTVI supports events to provide insight into the input/output and state for
    typical services that exist in speech-to-speech workflows.
  </Card>

  <Card title="Metrics and Monitoring" icon="chart-line">
    RTVI provides mechanisms for collecting metrics and monitoring the
    performance of server-side services.
  </Card>
</CardGroup>

## Terms

* **Client**: The front-end application or user interface that interacts with the RTVI server.
* **Server**: The backend-end service that runs the AI framework and processes requests from the client.
* **User**: The end user interacting with the client application.
* **Bot**: The AI interacting with the user, technically an amalgamation of a large language model (LLM) and a text-to-speech (TTS) service.

## RTVI Message Format

The messages defined as part of the RTVI protocol adhere to the following format:

```json theme={null}
{
    "id": string,
    "label": "rtvi-ai",
    "type": string,
    "data": unknown
}
```

<ParamField path="id" type="string" required={false}>
  A unique identifier for the message, used to correlate requests and responses.
</ParamField>

<ParamField path="label" type="string" required={true} default="rtvi-ai">
  A label that identifies this message as an RTVI message. This field is
  required and should always be set to `'rtvi-ai'`.
</ParamField>

<ParamField path="type" type="string" required={true}>
  The type of message being sent. This field is required and should be set to
  one of the predefined RTVI message types listed below.
</ParamField>

<ParamField path="data" type="unknown" required={false}>
  The payload of the message, which can be any data structure relevant to the
  message type.
</ParamField>

## RTVI Message Types

Following the above format, this section describes the various message types defined by the RTVI standard. Each message type has a specific purpose and structure, allowing for clear communication between clients and servers.

<Tip>
  Each message type below includes either a 🤖 or 🏄 emoji to denote whether the
  message is sent from the bot (🤖) or client (🏄).
</Tip>

### Connection Management

#### client-ready 🏄

Indicates that the client is ready to receive messages and interact with the server. Typically sent after the transport media channels have connected.

* **type**: `'client-ready'`
* **data**:
  * **version**: `string`

    The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations.

  * **about**: `AboutClient Object`

    An object containing information about the client, such as its rtvi-version, client library, and any other relevant metadata. The `AboutClient` object follows this structure:

    <Expandable title="AboutClient">
      <ParamField path="library" type="string" required />

      <ParamField path="library_version" type="string" />

      <ParamField path="platform" type="string" />

      <ParamField path="platform_version" type="string" />

      <ParamField path="platform_details" type="any">
        Any platform-specific details that may be relevant to the server. This
        could include information about the browser, operating system, or any
        other environment-specific data needed by the server. This field is
        optional and open-ended, so please be mindful of the data you include
        here and any security concerns that may arise from exposing sensitive or
        personal-identifiable information.
      </ParamField>
    </Expandable>

#### bot-ready 🤖

Indicates that the bot is ready to receive messages and interact with the client. Typically send after the transport media channels have connected.

* **type**: `'bot-ready'`
* **data**:
  * **version**: `string`

    The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations.

  * **about**: `any` (Optional)

    An object containing information about the server or bot. It's structure and value are both undefined by default. This provides flexibility to include any relevant metadata your client may need to know about the server at connection time, without any built-in security concerns. Please be mindful of the data you include here and any security concerns that may arise from exposing sensitive information.

#### disconnect-bot 🏄

Indicates that the client wishes to disconnect from the bot. Typically used when the client is shutting down or no longer needs to interact with the bot. Note: Disconnets should happen automatically when either the client or bot disconnects from the transport, so this message is intended for the case where a client may want to remain connected to the transport but no longer wishes to interact with the bot.

* **type**: `'disconnect-bot'`
* **data**: `undefined`

#### error 🤖

Indicates an error occurred during bot initialization or runtime.

* **type**: `'error'`
* **data**:
  * **message**: `string`

    Description of the error.

  * **fatal**: `boolean`

    Indicates if the error is fatal to the session.

### Speaking and Transcription

#### user-started-speaking 🤖

Emitted when the user begins speaking

* **type**: `'user-started-speaking'`
* **data**: None

#### user-stopped-speaking 🤖

Emitted when the user stops speaking

* **type**: `'user-stopped-speaking'`
* **data**: None

#### bot-started-speaking 🤖

Emitted when the bot begins speaking

* **type**: `'bot-started-speaking'`
* **data**: None

#### bot-stopped-speaking 🤖

Emitted when the bot stops speaking

* **type**: `'bot-stopped-speaking'`
* **data**: None

#### user-mute-started 🤖

<Note>
  Introduced in RTVI version 1.2.0 (Pipecat 0.0.102, client-js 1.6.0).
</Note>

Emitted when the server begins ignoring audio from the client (server-side muting). The client should continue sending audio normally but may want to show an indication to the user that their input is not being processed.

* **type**: `'user-mute-started'`
* **data**: None

#### user-mute-stopped 🤖

<Note>
  Introduced in RTVI version 1.2.0 (Pipecat 0.0.102, client-js 1.6.0).
</Note>

Emitted when the server stops ignoring audio from the client (server-side muting ends). The client can update its UI to indicate that the user's input is being processed again.

* **type**: `'user-mute-stopped'`
* **data**: None

#### user-transcription 🤖

Real-time transcription of user speech, including both partial and final results.

* **type**: `'user-transcription'`
* **data**:
  * **text**: `string`

    The transcribed text of the user.

  * **final**: `boolean`

    Indicates if this is a final transcription or a partial result.

  * **timestamp**: `string`

    The timestamp when the transcription was generated.

  * **user\_id**: `string`

    Identifier for the user who spoke.

#### bot-output 🤖

The best-effort representation of the bot's output text, including both spoken and unspoken text. In addition to transcriptions of spoken text, this message type may also include text that the bot outputs but does not speak (e.g., text sent to the client for display purposes only). Along with the text, this event includes a `spoken` flag to indicate whether the text was spoken by the bot or not and an `aggregated_by` field to indicate what the text represents (e.g. "sentence", "word", "code", "url").

* **type**: `'bot-output'`
* **data**:
  * **text**: `string`

    The output text from the bot.

  * **spoken**: `boolean`

    Indicates if this text was spoken by the bot.

  * **aggregated\_by**: `string`

    Indicates how the text was aggregated (e.g., "sentence", "word", "code", "url"). "sentence" and "word" are reserved aggregation types defined by the RTVI standard. Other aggregation types may be defined by custom text aggregators used by the server.

#### bot-transcription 🤖

<Warning>
  DEPRECATED in favor of `bot-output` in Pipecat version 0.0.95 and client-js
  version 1.5.0
</Warning>

Transcription of the bot's speech. Note: This protocol currently does not match the user transcription format to support real-time timestamping for bot transcriptions. Rather, the event is typically sent for each sentence of the bot's response. This difference is currently due to limitations in TTS services which mostly do not support (or support well), accurate timing information. If/when this changes, this protocol may be updated to include the necessary timing information. For now, if you want to attempt real-time transcription to match your bot's speaking, you can try using the `bot-tts-text` message type.

* **type**: `'bot-transcription'`
* **data**:
  * **text**: `string`

    The transcribed text from the bot, typically aggregated at a per-sentence level.

### Client-Server Messaging

#### server-message 🤖

An arbitrary message sent from the server to the client. This can be used for custom interactions or commands. This message may be coupled with the `client-message` message type to handle responses from the client.

* **type**: `'server-message'`
* **data**: `any`

  The `data` can be any JSON-serializable object, formatted according to your own specifications.

#### client-message 🏄

An arbitrary message sent from the client to the server. This can be used for custom interactions or commands. This message may be coupled with the `server-response` message type to handle responses from the server.

* **type**: `'client-message'`
* **data**:

  * **t**: `string`
  * **d**: `unknown` (optional)

  The data payload should contain a `t` field indicating the type of message and an optional `d` field containing any custom, corresponding data needed for the message.

#### server-response 🤖

An message sent from the server to the client in response to a `client-message`. **IMPORTANT**: The `id` should match the `id` of the original `client-message` to correlate the response with the request.

* **type**: `'client-message'`
* **data**:

  * **t**: `string`
  * **d**: `unknown` (optional)

  The data payload should contain a `t` field indicating the type of message and an optional `d` field containing any custom, corresponding data needed for the message.

#### error-response 🤖

Error response to a specific client message. **IMPORTANT**: The `id` should match the `id` of the original `client-message` to correlate the response with the request.

* **type**: `'error-response'`
* **data**:
  * **error**: `string`

### Advanced LLM Interactions

#### send-text 🏄

A message sent from the client to the server to send text input to the LLM, appended to the user's context.

* **type**: `'send-text'`
* **data**:
  * **content**: `string`

    The text content to be appended to the user context.

  * **options**: `object` (optional)
    * **run\_immediately**: `boolean` (optional)

      If `true`, the pipeline should be interrupted and the LLM should process the input immediately after appending it to the context. Defaults to `true`.

    * **audio\_response**: `boolean` (optional)

      If `false`, the bot should bypass the TTS so that the bot does not respond to the text in audio. Defaults to `true`.

#### llm-function-call-started 🤖

<Note>
  Introduced in RTVI version 1.2.0 (Pipecat 0.0.102, client-js 1.6.0).
</Note>

Indicates that a function call has been initiated by the LLM. The amount of metadata included in this event is controlled by the server's `function_call_report_level` configuration.

* **type**: `'llm-function-call-started'`
* **data**:
  * **function\_name**: `string` (optional)

    Name of the function being called. Only included if the server's report level is `NAME` or `FULL`.

#### llm-function-call-in-progress 🤖

<Note>
  Introduced in RTVI version 1.2.0 (Pipecat 0.0.102, client-js 1.6.0).
</Note>

Indicates that a function call is in progress. This is the successor to the deprecated `llm-function-call` message and is the event that triggers registered `FunctionCallHandler`s when a `function_name` is present.

* **type**: `'llm-function-call-in-progress'`
* **data**:
  * **function\_name**: `string` (optional)

    Name of the function being called. Only included if the server's report level is `NAME` or `FULL`.

  * **tool\_call\_id**: `string`

    Unique identifier for this function call.

  * **arguments**: `Record<string, unknown>` (optional)

    Arguments passed to the function. Only included if the server's report level is `FULL`.

#### llm-function-call-stopped 🤖

<Note>
  Introduced in RTVI version 1.2.0 (Pipecat 0.0.102, client-js 1.6.0).
</Note>

Indicates that a function call has completed or been cancelled.

* **type**: `'llm-function-call-stopped'`
* **data**:
  * **function\_name**: `string` (optional)

    Name of the function that was called. Only included if the server's report level is `NAME` or `FULL`.

  * **tool\_call\_id**: `string`

    Identifier matching the original function call.

  * **cancelled**: `boolean`

    Indicates whether the function call was cancelled before completing.

  * **result**: `unknown` (optional)

    The result of the function call, if available. Only included if the server's report level is `FULL`.

#### llm-function-call 🤖

<Warning>
  DEPRECATED in favor of `llm-function-call-in-progress` in Pipecat version
  0.0.102 and client-js version 1.6.0
</Warning>

A function call request from the LLM, sent from the bot to the client. Note that for most cases, an LLM function call will be handled completely server-side. However, in the event that the call requires input from the client or the client needs to be aware of the function call, this message/response schema is required.

* **type**: `'llm-function-call'`
* **data**:
  * **function\_name**: `string`

    Name of the function to be called.

  * **tool\_call\_id**: `string`

    Unique identifier for this function call.

  * **args**: `Record<string, unknown>`

    Arguments to be passed to the function.

#### llm-function-call-result 🏄

The result of the function call requested by the LLM, returned from the client.

* **type**: `'llm-function-call-result'`
* **data**:
  * **function\_name**: `string`

    Name of the called function.

  * **tool\_call\_id**: `string`

    Identifier matching the original function call.

  * **arguments**: `Record<string, unknown>`

    Arguments that were passed to the function.

  * **result**: `Record<string, unknown> | string`

    The result returned by the function.

#### bot-llm-search-response 🤖

Search results from the LLM's knowledge base.

<Note>
  Currently, Google Gemini is the only LLM that supports built-in search.
  However, we expect other LLMs to follow suite, which is why this message type
  is defined as part of the RTVI standard. As more LLMs add support for this
  feature, the format of this message type may evolve to accommodate
  discrepancies.
</Note>

* **type**: `'bot-llm-search-response'`
* **data**:
  * **search\_result**: `string` (optional)

    Raw search result text.

  * **rendered\_content**: `string` (optional)

    Formatted version of the search results.

  * **origins**: `Array<Origin Object>`

    Source information and confidence scores for search results. The `Origin Object` follows this structure:

    ```json theme={null}
    {
        "site_uri": string (optional),
        "site_title": string (optional),
        "results": Array<{
            "text": string,
            "confidence": number[]
        }>
    }
    ```

**Example:**

```json theme={null}
"id": undefined
"label": "rtvi-ai"
"type": "bot-llm-search-response"
"data": {
    "origins": [
    {
        "results": [
            {
                "confidence": [0.9881149530410768],
                "text": "* Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm."
            },
            {
                "confidence": [0.9692034721374512],
                "ext": "* A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm."
            }
        ],
        "site_title": "vanderbilt.edu",
        "site_uri": "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHwif83VK9KAzrbMSGSBsKwL8vWfSfC9pgEWYKmStHyqiRoV1oe8j1S0nbwRg_iWgqAr9wUkiegu3ATC8Ll-cuE-vpzwElRHiJ2KgRYcqnOQMoOeokVpWqi"
    },
    {
        "results": [
            {
                "confidence": [0.6554043292999268],
                "text": "In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields."
            }
        ],
        "site_title": "wikipedia.org",
        "site_uri": "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQESbF-ijx78QbaglrhflHCUWdPTD4M6tYOQigW5hgsHNctRlAHu9ktfPmJx7DfoP5QicE0y-OQY1cRl9w4Id0btiFgLYSKIm2-SPtOHXeNrAlgA7mBnclaGrD7rgnLIbrjl8DgUEJrrvT0CKzuo"
    }],
    "rendered_content": "<style>\n.container ... </div>\n</div>\n",
    "search_result": "Several events are happening at Vanderbilt University:\n\n* Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm.\n* A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm.\n\n  In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields.  For the most recent news, you should check Vanderbilt's official news website.\n"
}
```

### Service-Specific Insights

#### bot-llm-started 🤖

Indicates LLM processing has begun

* **type**: `bot-llm-started`
* **data**: None

#### bot-llm-stopped 🤖

Indicates LLM processing has completed

* **type**: `bot-llm-stopped`
* **data**: None

#### user-llm-text 🤖

Aggregated user input text that is sent to the LLM.

* **type**: `'user-llm-text'`
* **data**:
  * **text**: `string`

    The user's input text to be processed by the LLM.

#### bot-llm-text 🤖

Individual tokens streamed from the LLM as they are generated.

* **type**: `'bot-llm-text'`
* **data**:
  * **text**: `string`

    The token text from the LLM.

#### bot-tts-started 🤖

Indicates text-to-speech (TTS) processing has begun.

* **type**: `'bot-tts-started'`
* **data**: None

#### bot-tts-stopped 🤖

Indicates text-to-speech (TTS) processing has completed.

* **type**: `'bot-tts-stopped'`
* **data**: None

#### bot-tts-text 🤖

The per-token text output of the text-to-speech (TTS) service (what the TTS actually says).

* **type**: `'bot-tts-text'`
* **data**:
  * **text**: `string`

    The text representation of the generated bot speech.

### Metrics and Monitoring

#### metrics 🤖

Performance metrics for various processing stages and services. Each message will contain entries for one or more of the metrics types: `processing`, `ttfb`, `characters`.

* **type**: `'metrics'`
* **data**:
  * **processing**: \[See Below] (optional)

    Processing time metrics.

  * **ttfb**: \[See Below] (optional)

    Time to first byte metrics.

  * **characters**: \[See Below] (optional)

    Character processing metrics.

For each metric type, the data structure is an array of objects with the following structure:

* **processor**: `string`

  The name of the processor or service that generated the metric.

* **value**: `number`

  The value of the metric, typically in milliseconds or character count.

* **model**: `string` (optional)

  The model of the service that generated the metric, if applicable.

**Example:**

```json theme={null}
{
  "type": "metrics",
  "data": {
    "processing": [
      {
        "model": "eleven_flash_v2_5",
        "processor": "ElevenLabsTTSService#0",
        "value": 0.0005140304565429688
      }
    ],
    "ttfb": [
      {
        "model": "eleven_flash_v2_5",
        "processor": "ElevenLabsTTSService#0",
        "value": 0.1573178768157959
      }
    ],
    "characters": [
      {
        "model": "eleven_flash_v2_5",
        "processor": "ElevenLabsTTSService#0",
        "value": 43
      }
    ]
  }
}
```
