OpenAIRealtimeBetaLLMService provides real-time, multimodal conversation capabilities using OpenAI’s Realtime Beta API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management.

Real-time Interaction

Stream audio in real-time with minimal latency response times

Speech Processing

Built-in speech-to-text and text-to-speech capabilities with voice options

Advanced Turn Detection

Multiple voice activity detection options including semantic turn detection

Powerful Function Calling

Seamless support for calling external functions and APIs

Installation

To use OpenAIRealtimeBetaLLMService, install the required dependencies:
pip install "pipecat-ai[openai]"
You’ll also need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY.

Configuration

Constructor Parameters

api_key
str
required
Your OpenAI API key
model
str
default:"gpt-4o-realtime-preview-2025-06-03"
The speech-to-speech model used for processing
base_url
str
default:"wss://api.openai.com/v1/realtime"
WebSocket endpoint URL
session_properties
SessionProperties
Configuration for the realtime session
start_audio_paused
bool
default:"False"
Whether to start with audio input paused
send_transcription_frames
bool
default:"True"
Whether to emit transcription frames

Session Properties

The SessionProperties object configures the behavior of the realtime session:
modalities
List[Literal['text', 'audio']]
The modalities to enable (default includes both text and audio)
instructions
str
System instructions that guide the model’s behavior
service = OpenAIRealtimeBetaLLMService(
    api_key=os.getenv("OPENAI_API_KEY"),
    session_properties=SessionProperties(
        instructions="You are a helpful assistant. Be concise and friendly."
    )
)
voice
str
Voice ID for text-to-speech (options: alloy, echo, fable, onyx, nova, shimmer)
input_audio_format
Literal['pcm16', 'g711_ulaw', 'g711_alaw']
Format of the input audio
output_audio_format
Literal['pcm16', 'g711_ulaw', 'g711_alaw']
Format of the output audio
input_audio_transcription
InputAudioTranscription
Configuration for audio transcription
from pipecat.services.openai_realtime_beta.events import InputAudioTranscription

service = OpenAIRealtimeBetaLLMService(
    api_key=os.getenv("OPENAI_API_KEY"),
    session_properties=SessionProperties(
        input_audio_transcription=InputAudioTranscription(
            model="gpt-4o-transcribe",
            language="en",
            prompt="This is a technical conversation about programming"
        )
    )
)
input_audio_noise_reduction
InputAudioNoiseReduction
Configuration for audio noise reduction
turn_detection
Union[TurnDetection, SemanticTurnDetection, bool]
Configuration for turn detection (set to False to disable)
tools
List[Dict]
List of function definitions for tool/function calling
tool_choice
Literal['auto', 'none', 'required']
Controls when the model calls functions
temperature
float
Controls randomness in responses (0.0 to 2.0)
max_response_output_tokens
Union[int, Literal['inf']]
Maximum number of tokens to generate

Input Frames

Audio Input

InputAudioRawFrame
Frame
Raw audio data for speech input

Control Input

StartInterruptionFrame
Frame
Signals start of user interruption
UserStartedSpeakingFrame
Frame
Signals user started speaking
UserStoppedSpeakingFrame
Frame
Signals user stopped speaking

Context Input

OpenAILLMContextFrame
Frame
Contains conversation context
LLMMessagesAppendFrame
Frame
Appends messages to conversation

Output Frames

Audio Output

TTSAudioRawFrame
Frame
Generated speech audio

Control Output

TTSStartedFrame
Frame
Signals start of speech synthesis
TTSStoppedFrame
Frame
Signals end of speech synthesis

Text Output

TextFrame
Frame
Generated text responses
TranscriptionFrame
Frame
Speech transcriptions

Events

on_conversation_item_created
event
Emitted when a conversation item on the server is created. Handler receives:
  • item_id: str
  • item: ConversationItem
on_conversation_item_updated
event
Emitted when a conversation item on the server is updated. Handler receives:
  • item_id: str
  • item: Optional[ConversationItem] (may not exist for some updates)

Methods

retrieve_conversation_item
method
Retrieves a conversation item’s details from the server.
async def retrieve_conversation_item(self, item_id: str) -> ConversationItem

Usage Example

from pipecat.services.openai_realtime_beta import OpenAIRealtimeBetaLLMService
from pipecat.services.openai_realtime_beta.events import SessionProperties, TurnDetection

# Configure service
service = OpenAIRealtimeBetaLLMService(
    api_key="your-api-key",
    session_properties=SessionProperties(
        modalities=["audio", "text"],
        voice="alloy",
        turn_detection=TurnDetection(
            threshold=0.5,
            silence_duration_ms=800
        ),
        temperature=0.7
    )
)

# Use in pipeline
pipeline = Pipeline([
    audio_input,       # Produces InputAudioRawFrame
    service,           # Processes speech/generates responses
    audio_output       # Handles TTSAudioRawFrame
])

Function Calling

The service supports function calling with automatic response handling:
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.services.openai_realtime_beta import SessionProperties

# Define weather function using standardized schema
weather_function = FunctionSchema(
    name="get_weather",
    description="Get weather information",
    properties={
        "location": {"type": "string"}
    },
    required=["location"]
)

# Create tools schema
tools = ToolsSchema(standard_tools=[weather_function])

# Configure service with tools
llm = OpenAIRealtimeBetaLLMService(
    api_key="your-api-key",
    session_properties=SessionProperties(
        tools=tools,
        tool_choice="auto"
    )
)

llm.register_function("get_weather", fetch_weather_from_api)
See the Function Calling guide for:
  • Detailed implementation instructions
  • Provider-specific function definitions
  • Handler registration examples
  • Control over function call behavior
  • Complete usage examples

Frame Flow

Metrics Support

The service collects comprehensive metrics:
  • Token usage (prompt and completion)
  • Processing duration
  • Time to First Byte (TTFB)
  • Audio processing metrics
  • Function call metrics

Advanced Features

Turn Detection

# Server-side basic VAD
turn_detection = TurnDetection(
    type="server_vad",
    threshold=0.5,
    prefix_padding_ms=300,
    silence_duration_ms=800
)

# Server-side semantic VAD
turn_detection = SemanticTurnDetection(
  type="semantic_vad",
  eagerness="auto", # default. could also be "low" | "medium" | "high"
  create_response=True # default
  interrupt_response=True # default
)

# Disable turn detection
turn_detection = False

Context Management

# Create context
context = OpenAIRealtimeLLMContext(
    messages=[],
    tools=[],
    system="You are a helpful assistant"
)

# Create aggregators
aggregators = service.create_context_aggregator(context)

Foundational Examples

OpenAI Realtime Beta Example

Basic implementation showing core realtime features including audio streaming, turn detection, and function calling.

Notes

  • Supports real-time speech-to-speech conversation
  • Handles interruptions and turn-taking
  • Manages WebSocket connection lifecycle
  • Provides function calling capabilities
  • Supports conversation context management
  • Includes comprehensive error handling
  • Manages audio streaming and processing
  • Handles both text and audio modalities