OpenAI Realtime Beta
Real-time speech-to-speech service implementation using OpenAI’s Realtime Beta API
Overview
OpenAIRealtimeBetaLLMService
provides real-time, multimodal conversation capabilities using OpenAI’s Realtime Beta API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management.
Installation
To use OpenAIRealtimeBetaLLMService
, install the required dependencies:
You’ll also need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY
.
Configuration
Constructor Parameters
Your OpenAI API key
The speech-to-speech model used for processing
WebSocket endpoint URL
Configuration for the realtime session
Whether to start with audio input paused
Whether to emit transcription frames
Session Properties
Input Frames
Audio Input
Raw audio data for speech input
Control Input
Signals start of user interruption
Signals user started speaking
Signals user stopped speaking
Context Input
Contains conversation context
Appends messages to conversation
Output Frames
Audio Output
Generated speech audio
Control Output
Signals start of speech synthesis
Signals end of speech synthesis
Text Output
Generated text responses
Speech transcriptions
Usage Example
Function Calling
The service supports function calling with automatic response handling:
Frame Flow
Metrics Support
The service collects comprehensive metrics:
- Token usage (prompt and completion)
- Processing duration
- Time to First Byte (TTFB)
- Audio processing metrics
- Function call metrics
Advanced Features
Turn Detection
Context Management
Notes
- Supports real-time speech-to-speech conversation
- Handles interruptions and turn-taking
- Manages WebSocket connection lifecycle
- Provides function calling capabilities
- Supports conversation context management
- Includes comprehensive error handling
- Manages audio streaming and processing
- Handles both text and audio modalities