Speech-to-text service implementation using Speechmatics’ real-time transcription STT API
SpeechmaticsSTTService
enables real-time speech transcription using Speechmatics’ WebSocket API with partial + final results, speaker diarization, and end of utterance detection (VAD).
SpeechmaticsSTTService
, install the required dependencies:
SPEECHMATICS_API_KEY
.
InputAudioRawFrame
- Raw PCM audio data (16-bit, 16kHz, mono)STTUpdateSettingsFrame
- Runtime transcription configuration updatesSTTMuteFrame
- Mute audio input for transcriptionInterimTranscriptionFrame
- Real-time transcription updatesTranscriptionFrame
- Final transcription resultsErrorFrame
- Connection or processing errorsEU2
):
Region | Environment | STT Endpoint |
---|---|---|
EU | EU1 | wss://eu1.rt.speechmatics.com/ |
EU | EU2 | wss://eu2.rt.speechmatics.com/ |
US | US1 | wss://us1.rt.speechmatics.com/ |
language
parameter when creating the STT object. The exception to this is English / Mandarin which has the code cmn_en
and must be set using the language_code
parameter.
Language Code | Description | Locales | Domain Options |
---|---|---|---|
Language.AR | Arabic | - | - |
Language.BA | Bashkir | - | - |
Language.EU | Basque | - | - |
Language.BE | Belarusian | - | - |
Language.BG | Bulgarian | - | - |
Language.BN | Bengali | - | - |
Language.YUE | Cantonese | - | - |
Language.CA | Catalan | - | - |
Language.HR | Croatian | - | - |
Language.CS | Czech | - | - |
Language.DA | Danish | - | - |
Language.NL | Dutch | - | - |
Language.EN | English | en-US , en-GB , en-AU | finance |
Language.EO | Esperanto | - | - |
Language.ET | Estonian | - | - |
Language.FA | Persian | - | - |
Language.FI | Finnish | - | - |
Language.FR | French | - | - |
Language.GL | Galician | - | - |
Language.DE | German | - | - |
Language.EL | Greek | - | - |
Language.HE | Hebrew | - | - |
Language.HI | Hindi | - | - |
Language.HU | Hungarian | - | - |
Language.IA | Interlingua | - | - |
Language.IT | Italian | - | - |
Language.ID | Indonesian | - | - |
Language.GA | Irish | - | - |
Language.JA | Japanese | - | - |
Language.KO | Korean | - | - |
Language.LV | Latvian | - | - |
Language.LT | Lithuanian | - | - |
Language.MS | Malay | - | - |
Language.MT | Maltese | - | - |
Language.CMN | Mandarin | cmn-Hans , cmn-Hant | - |
cmn_en | English / Mandarin | - | - |
Language.MR | Marathi | - | - |
Language.MN | Mongolian | - | - |
Language.NO | Norwegian | - | - |
Language.PL | Polish | - | - |
Language.PT | Portuguese | - | - |
Language.RO | Romanian | - | - |
Language.RU | Russian | - | - |
Language.SK | Slovakian | - | - |
Language.SL | Slovenian | - | - |
Language.ES | Spanish | - | bilingual-en |
Language.SV | Swedish | - | - |
Language.SW | Swahili | - | - |
Language.TA | Tamil | - | - |
Language.TH | Thai | - | - |
Language.TR | Turkish | - | - |
Language.UG | Uyghur | - | - |
Language.UK | Ukrainian | - | - |
Language.UR | Urdu | - | - |
Language.VI | Vietnamese | - | - |
Language.CY | Welsh | - | - |
Language Code | Description |
---|---|
Language.BG | Bulgarian |
Language.CA | Catalan |
Language.CMN | Mandarin |
Language.CS | Czech |
Language.DA | Danish |
Language.DE | German |
Language.EL | Greek |
Language.EN | English |
Language.ES | Spanish |
Language.ET | Estonian |
Language.FI | Finnish |
Language.FR | French |
Language.GL | Galician |
Language.HI | Hindi |
Language.HR | Croatian |
Language.HU | Hungarian |
Language.ID | Indonesian |
Language.IT | Italian |
Language.JA | Japanese |
Language.KO | Korean |
Language.LT | Lithuanian |
Language.LV | Latvian |
Language.MS | Malay |
Language.NL | Dutch |
Language.NO | Norwegian |
Language.PL | Polish |
Language.PT | Portuguese |
Language.RO | Romanian |
Language.RU | Russian |
Language.SK | Slovakian |
Language.SL | Slovenian |
Language.SV | Swedish |
Language.TR | Turkish |
Language.UK | Ukrainian |
Language.VI | Vietnamese |
user_id
attribute.
To enable this feature, set enable_speaker_diarization
to True
. Additionally, if a text_format
is provided, then the text output for the TranscriptionFrame will be formatted to this specification. Your system context can then be updated to include information about this format to understand which speaker spoke which words.
For example, if you have text_format
= <{speaker_id}>{text}</{speaker_id}>
, then the output would be <S1>Good morning.</S1>
.
Attribute | Description | Example |
---|---|---|
speaker_id | The ID of the speaker | S1 |
text | The transcribed text | Good morning. |
SpeechmaticsSTTService
and use it in a pipeline:
STTUpdateSettingsFrame
for the SpeechmaticsSTTService
:
16000
in pcm_s16le
format