Speech-to-text service implementation using OpenAI’s Speech-to-Text APIs
OpenAISTTService
provides high-accuracy speech recognition using OpenAI’s advanced transcription models, including the latest GPT-4o transcription model and the proven Whisper API. It uses Voice Activity Detection (VAD) to efficiently process speech segments with superior accuracy and context understanding.
OPENAI_API_KEY
.
InputAudioRawFrame
- Raw PCM audio data (16-bit, mono)UserStartedSpeakingFrame
- VAD signal to start buffering audioUserStoppedSpeakingFrame
- VAD signal to process buffered audioTranscriptionFrame
- Final transcription results (no interim results)ErrorFrame
- API or processing errorsModel | Description | Best For | Accuracy | Speed |
---|---|---|---|---|
gpt-4o-transcribe | Latest GPT-4o model fine-tuned for transcription | High accuracy, robustness to accents, context understanding | Highest | Fast |
whisper-1 | OpenAI’s proven Whisper model | Broad language support, clean audio | High | Fast |
gpt-4o-transcribe
for the best accuracy and context
understanding, especially with challenging audio or technical content.View All Supported Languages
Language Code | Description | Service Code |
---|---|---|
Language.AF | Afrikaans | af |
Language.AR | Arabic | ar |
Language.HY | Armenian | hy |
Language.AZ | Azerbaijani | az |
Language.BE | Belarusian | be |
Language.BS | Bosnian | bs |
Language.BG | Bulgarian | bg |
Language.CA | Catalan | ca |
Language.ZH | Chinese | zh |
Language.HR | Croatian | hr |
Language.CS | Czech | cs |
Language.DA | Danish | da |
Language.NL | Dutch | nl |
Language.EN | English | en |
Language.ET | Estonian | et |
Language.FI | Finnish | fi |
Language.FR | French | fr |
Language.GL | Galician | gl |
Language.DE | German | de |
Language.EL | Greek | el |
Language.HE | Hebrew | he |
Language.HI | Hindi | hi |
Language.HU | Hungarian | hu |
Language.IS | Icelandic | is |
Language.ID | Indonesian | id |
Language.IT | Italian | it |
Language.JA | Japanese | ja |
Language.KN | Kannada | kn |
Language.KK | Kazakh | kk |
Language.KO | Korean | ko |
Language.LV | Latvian | lv |
Language.LT | Lithuanian | lt |
Language.MK | Macedonian | mk |
Language.MS | Malay | ms |
Language.MR | Marathi | mr |
Language.MI | Maori | mi |
Language.NE | Nepali | ne |
Language.NO | Norwegian | no |
Language.FA | Persian | fa |
Language.PL | Polish | pl |
Language.PT | Portuguese | pt |
Language.RO | Romanian | ro |
Language.RU | Russian | ru |
Language.SR | Serbian | sr |
Language.SK | Slovak | sk |
Language.SL | Slovenian | sl |
Language.ES | Spanish | es |
Language.SW | Swahili | sw |
Language.SV | Swedish | sv |
Language.TL | Tagalog | tl |
Language.TA | Tamil | ta |
Language.TH | Thai | th |
Language.TR | Turkish | tr |
Language.UK | Ukrainian | uk |
Language.UR | Urdu | ur |
Language.VI | Vietnamese | vi |
Language.CY | Welsh | cy |
Language.EN
- English - en
Language.ES
- Spanish - es
Language.FR
- French - fr
Language.DE
- German - de
Language.IT
- Italian - it
Language.JA
- Japanese - ja
EN_US
, FR_CA
) are automatically mapped to their
base language codes.OpenAISTTService
and use it in a pipeline:
STTUpdateSettingsFrame
for the OpenAISTTService
: