Overview

The OpenAIRealTimeWebRTCTransport is a fully functional RTVI Transport. It provides a framework for implementing real-time communication directly with the OpenAI Realtime API using WebRTC voice-to-voice service. It handles media device management, audio/video streams, and state management for the connection.

Usage

Basic Setup

import { OpenAIRealTimeWebRTCTransport, OpenAIServiceOptions } from '@pipecat-ai/openai-realtime-webrtc-transport';

const options: OpenAIServiceOptions = {
  api_key: 'YOUR_API_KEY',
  session_config: {
    instructions: 'you are a confused jellyfish',
  }
};

const transport = new OpenAIRealTimeWebRTCTransport(options);
let RTVIConfig: RTVIClientOptions = {
  transport,
  ...
};

Configuration Options

Below is the transport’s type definition for the OpenAI Session configuration. See the OpenAI Realtime API documentation for more details on each of the options and their defaults.

export type OpenAIFunctionTool = {
  type: "function";
  name: string;
  description: string;
  parameters: JSONSchema;
};

export type OpenAIServerVad = {
  type: "server_vad";
  create_response?: boolean;
  interrupt_response?: boolean;
  prefix_padding_ms?: number;
  silence_duration_ms?: number;
  threshold?: number;
};

export type OpenAISemanticVAD = {
  type: "semantic_vad";
  eagerness?: "low" | "medium" | "high" | "auto";
  create_response?: boolean; // defaults to true
  interrupt_response?: boolean; // defaults to true
};

export type OpenAISessionConfig = Partial<{
  modalities?: string;
  instructions?: string;
  voice?:
    | "alloy"
    | "ash"
    | "ballad"
    | "coral"
    | "echo"
    | "sage"
    | "shimmer"
    | "verse";
  input_audio_noise_reduction?: {
    type: "near_field" | "far_field";
  } | null; // defaults to null/off
  input_audio_transcription?: {
    model: "whisper-1" | "gpt-4o-transcribe" | "gpt-4o-mini-transcribe";
    language?: string;
    prompt?: string[] | string; // gpt-4o models take a string
  } | null; // we default this to gpt-4o-transcribe
  turn_detection?: OpenAIServerVad | OpenAISemanticVAD | null; // defaults to server_vad
  temperature?: number;
  max_tokens?: number | "inf";
  tools?: Array<OpenAIFunctionTool>;
}>;

export interface OpenAIServiceOptions {
  api_key: string;
  model?: string;
  initial_messages?: LLMContextMessage[];
  settings?: OpenAISessionConfig;
}

Sending Messages

// at setup time...
llmHelper = new LLMHelper({});
rtviClient.registerHelper("llm", llmHelper);
// the 'llm' name in this call above isn't used.
//that value is specific to working with a pipecat pipeline

// at time of sending message...
// Send text prompt message
llmHelper.appendToMessages({ role: "user", content: "Hello OpenAI!" });

Handling Events

The transport implements the various RTVI event handlers. Check out the docs or samples for more info.

Updating Session Configuration

transport.updateSessionConfig({
  instructions: "you are a an over-sharing neighbor",
  input_audio_noise_reduction: {
    type: "near_field",
  },
});

Currently, invalid session configurations will result in the OpenAI connection being failed.

More Information

Package

@pipecat-ai/openai-realtime-webrtc-transport