Speech-to-text service implementation using Speechmatics’ real-time transcription STT API
SpeechmaticsSTTService
enables real-time speech transcription using Speechmatics’ WebSocket API with partial + final results, speaker diarization, and end of utterance detection (VAD).
SpeechmaticsSTTService
, install the required dependencies:
SPEECHMATICS_API_KEY
.
InputAudioRawFrame
- Raw PCM audio data (16-bit, 16kHz, mono)InterimTranscriptionFrame
- Real-time transcription updatesTranscriptionFrame
- Final transcription resultsEU2
):
Region | Environment | STT Endpoint | Access |
---|---|---|---|
EU | EU1 | wss://neu.rt.speechmatics.com/ | Self-Service / Enterprise |
EU | EU2 (Default) | wss://eu2.rt.speechmatics.com/ | Self-Service / Enterprise |
US | US1 | wss://wus.rt.speechmatics.com/ | Enterprise |
language
parameter when creating the STT object. The exception to this is English / Mandarin which has the code cmn_en
and must be set using the language_code
parameter.
Language Code | Description | Locales |
---|---|---|
Language.AR | Arabic | - |
Language.BA | Bashkir | - |
Language.EU | Basque | - |
Language.BE | Belarusian | - |
Language.BG | Bulgarian | - |
Language.BN | Bengali | - |
Language.YUE | Cantonese | - |
Language.CA | Catalan | - |
Language.HR | Croatian | - |
Language.CS | Czech | - |
Language.DA | Danish | - |
Language.NL | Dutch | - |
Language.EN | English | en-US , en-GB , en-AU |
Language.EO | Esperanto | - |
Language.ET | Estonian | - |
Language.FA | Persian | - |
Language.FI | Finnish | - |
Language.FR | French | - |
Language.GL | Galician | - |
Language.DE | German | - |
Language.EL | Greek | - |
Language.HE | Hebrew | - |
Language.HI | Hindi | - |
Language.HU | Hungarian | - |
Language.IA | Interlingua | - |
Language.IT | Italian | - |
Language.ID | Indonesian | - |
Language.GA | Irish | - |
Language.JA | Japanese | - |
Language.KO | Korean | - |
Language.LV | Latvian | - |
Language.LT | Lithuanian | - |
Language.MS | Malay | - |
Language.MT | Maltese | - |
Language.CMN | Mandarin | cmn-Hans , cmn-Hant |
Language.MR | Marathi | - |
Language.MN | Mongolian | - |
Language.NO | Norwegian | - |
Language.PL | Polish | - |
Language.PT | Portuguese | - |
Language.RO | Romanian | - |
Language.RU | Russian | - |
Language.SK | Slovakian | - |
Language.SL | Slovenian | - |
Language.ES | Spanish | - |
Language.SV | Swedish | - |
Language.SW | Swahili | - |
Language.TA | Tamil | - |
Language.TH | Thai | - |
Language.TR | Turkish | - |
Language.UG | Uyghur | - |
Language.UK | Ukrainian | - |
Language.UR | Urdu | - |
Language.VI | Vietnamese | - |
Language.CY | Welsh | - |
language_code
and domain
parameters as follows:
Language Code | Description | Domain Options |
---|---|---|
cmn_en | English / Mandarin | - |
en_ms | English / Malay | - |
Language.ES | English / Spanish | bilingual-en |
en_ta | English / Tamil | - |
user_id
attribute.
To enable this feature, set enable_diarization
to True
. Additionally, if speaker_active_format
or speaker_passive_format
are provided, then the text output for the TranscriptionFrame will be formatted to this specification. Your system context can then be updated to include information about this format to understand which speaker spoke which words. The passive format is optional and is used when the engine has been told to focus on specific speakers and other speakers will then be formatted using the speaker_passive_format
format.
speaker_active_format
-> the formatter for active speakersspeaker_passive_format
-> the formatter for passive / background speakers<{speaker_id}>{text}</{speaker_id}>
-> <S1>Good morning.</S1>
.@{speaker_id}: {text}
-> @S1: Good morning.
.Attribute | Description | Example |
---|---|---|
speaker_id | The ID of the speaker | S1 |
text | The transcribed text | Good morning. |
SpeechmaticsSTTService
and use it in a pipeline:
S1
). Words from other speakers are transcribed but only sent when the first speaker speaks. When using the enable_vad
option, this will use the speaker diarization to determine when a speaker is speaking. You will need to disable VAD options within the selected transport object to ensure this works correctly (see 07b-interruptible-speechmatics-vad.py as an example).
Initialize the SpeechmaticsSTTService
and use it in a pipeline:
16000
in pcm_s16le
format