Overview
OpenAISTTService
provides high-accuracy speech recognition using OpenAI’s advanced transcription models, including the latest GPT-4o transcription model and the proven Whisper API. It uses Voice Activity Detection (VAD) to efficiently process speech segments with superior accuracy and context understanding.
API Reference
Complete API documentation and method details
OpenAI Docs
Official OpenAI transcription documentation and features
Example Code
Working example with OpenAI ecosystem integration
Installation
To use OpenAI services, install the required dependency:OPENAI_API_KEY
.
Get your API key from the OpenAI
Platform.
Frames
Input
InputAudioRawFrame
- Raw PCM audio data (16-bit, mono)UserStartedSpeakingFrame
- VAD signal to start buffering audioUserStoppedSpeakingFrame
- VAD signal to process buffered audio
Output
TranscriptionFrame
- Final transcription results (no interim results)ErrorFrame
- API or processing errors
Models
OpenAI offers two transcription models with different strengths:Model | Description | Best For | Accuracy | Speed |
---|---|---|---|---|
gpt-4o-transcribe | Latest GPT-4o model fine-tuned for transcription | High accuracy, robustness to accents, context understanding | Highest | Fast |
whisper-1 | OpenAI’s proven Whisper model | Broad language support, clean audio | High | Fast |
Recommended: Use
gpt-4o-transcribe
for the best accuracy and context
understanding, especially with challenging audio or technical content.Language Support
OpenAI’s speech-to-text models support 60+ languages with automatic language detection:View All Supported Languages
View All Supported Languages
Language Code | Description | Service Code |
---|---|---|
Language.AF | Afrikaans | af |
Language.AR | Arabic | ar |
Language.HY | Armenian | hy |
Language.AZ | Azerbaijani | az |
Language.BE | Belarusian | be |
Language.BS | Bosnian | bs |
Language.BG | Bulgarian | bg |
Language.CA | Catalan | ca |
Language.ZH | Chinese | zh |
Language.HR | Croatian | hr |
Language.CS | Czech | cs |
Language.DA | Danish | da |
Language.NL | Dutch | nl |
Language.EN | English | en |
Language.ET | Estonian | et |
Language.FI | Finnish | fi |
Language.FR | French | fr |
Language.GL | Galician | gl |
Language.DE | German | de |
Language.EL | Greek | el |
Language.HE | Hebrew | he |
Language.HI | Hindi | hi |
Language.HU | Hungarian | hu |
Language.IS | Icelandic | is |
Language.ID | Indonesian | id |
Language.IT | Italian | it |
Language.JA | Japanese | ja |
Language.KN | Kannada | kn |
Language.KK | Kazakh | kk |
Language.KO | Korean | ko |
Language.LV | Latvian | lv |
Language.LT | Lithuanian | lt |
Language.MK | Macedonian | mk |
Language.MS | Malay | ms |
Language.MR | Marathi | mr |
Language.MI | Maori | mi |
Language.NE | Nepali | ne |
Language.NO | Norwegian | no |
Language.FA | Persian | fa |
Language.PL | Polish | pl |
Language.PT | Portuguese | pt |
Language.RO | Romanian | ro |
Language.RU | Russian | ru |
Language.SR | Serbian | sr |
Language.SK | Slovak | sk |
Language.SL | Slovenian | sl |
Language.ES | Spanish | es |
Language.SW | Swahili | sw |
Language.SV | Swedish | sv |
Language.TL | Tagalog | tl |
Language.TA | Tamil | ta |
Language.TH | Thai | th |
Language.TR | Turkish | tr |
Language.UK | Ukrainian | uk |
Language.UR | Urdu | ur |
Language.VI | Vietnamese | vi |
Language.CY | Welsh | cy |
Language.EN
- English -en
Language.ES
- Spanish -es
Language.FR
- French -fr
Language.DE
- German -de
Language.IT
- Italian -it
Language.JA
- Japanese -ja
Regional variants (like
EN_US
, FR_CA
) are automatically mapped to their
base language codes.Usage Example
Basic Configuration
Initialize theOpenAISTTService
and use it in a pipeline:
Advanced Configuration
Dynamic Configuration
Make settings updates by pushing anSTTUpdateSettingsFrame
for the OpenAISTTService
:
Metrics
The service provides comprehensive metrics:- Time to First Byte (TTFB) - API response latency
- Processing Duration - Total transcription time
Learn how to enable Metrics in your Pipeline.
Additional Notes
- Segmented Processing: Processes complete utterances, not continuous streams
- No Interim Results: Only final transcriptions are provided (typical for batch APIs)
- Audio Buffer: Maintains 1-second buffer to capture speech before VAD detection
- Language Variants: Regional language codes automatically map to base languages
- Context Prompts: GPT-4o especially benefits from domain-specific prompts
- Rate Limits: Check your OpenAI plan for concurrent request and usage limits
- Quality Focus: OpenAI prioritizes accuracy and context understanding over speed