OpenAI (Whisper)
Speech-to-text service implementation using OpenAI’s Whisper API
Overview
OpenAISTTService
provides speech-to-text capabilities using OpenAI’s hosted Whisper API. It offers high-accuracy transcription with minimal setup requirements.
Installation
To use OpenAISTTService
, install the required dependencies:
You’ll need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY
.
You can obtain an OpenAI API key from the OpenAI platform.
Configuration
Constructor Parameters
Whisper model to use. Currently only “whisper-1” is available.
Your OpenAI API key. If not provided, will use environment variable.
Custom API base URL for OpenAI API requests.
Input
The service processes audio data with the following requirements:
- PCM audio format
- 16-bit depth
- Single channel (mono)
Output Frames
The service produces two types of frames during transcription:
TranscriptionFrame
Generated for final transcriptions, containing:
Transcribed text
User identifier
ISO 8601 formatted timestamp
Detected language (if available)
ErrorFrame
Generated when transcription errors occur, containing error details.
Methods
Set Model
See the STT base class methods for additional functionality.
Usage Example
Frame Flow
Metrics Support
The service collects the following metrics:
- Time to First Byte (TTFB)
- Processing duration
- API response time
Notes
- Requires valid OpenAI API key
- Uses OpenAI’s hosted Whisper model
- Handles API rate limiting
- Automatic error handling
- Thread-safe processing
Error Handling
The service handles common API errors including:
- Authentication errors
- Rate limiting
- Invalid audio format
- Network connectivity issues
- API timeouts
Errors are propagated through ErrorFrames with descriptive messages.