Speechmatics
Speech-to-text service implementation using Speechmatics’ real-time transcription STT API
Overview
SpeechmaticsSTTService
enables real-time speech transcription using Speechmatics’ WebSocket API with partial + final results, speaker diarization, and end of utterance detection (VAD).
API Reference
Complete API documentation and method details
Speechmatics Docs
Official Speechmatics documentation and features
Speaker Diarization
Separating out different speakers in the audio
Example Code
Working example with interruption handling
Installation
To use SpeechmaticsSTTService
, install the required dependencies:
You’ll also need to set up your Speechmatics API key as an environment variable: SPEECHMATICS_API_KEY
.
Get your API key from the Speechmatics Portal.
Frames
Input
InputAudioRawFrame
- Raw PCM audio data (16-bit, 16kHz, mono)STTUpdateSettingsFrame
- Runtime transcription configuration updatesSTTMuteFrame
- Mute audio input for transcription
Output
InterimTranscriptionFrame
- Real-time transcription updatesTranscriptionFrame
- Final transcription resultsErrorFrame
- Connection or processing errors
Endpoints
Speechmatics STT supports the following endpoints (defaults to EU2
):
Region | Environment | STT Endpoint |
---|---|---|
EU | EU1 | wss://eu1.rt.speechmatics.com/ |
EU | EU2 | wss://eu2.rt.speechmatics.com/ |
US | US1 | wss://us1.rt.speechmatics.com/ |
Feature Discovery
To check the languages and features supported by Speechmatics STT, you can use the following code:
Language Support
Refer to the Speechmatics docs for more information on supported languages.
Speechmatics STT supports the following languages and regional variants.
Setting a language can be done using the language
parameter when creating the STT object. The exception to this is English / Mandarin which has the code cmn_en
and must be set using the language_code
parameter.
Language Code | Description | Locales | Domain Options |
---|---|---|---|
Language.AR | Arabic | - | - |
Language.BA | Bashkir | - | - |
Language.EU | Basque | - | - |
Language.BE | Belarusian | - | - |
Language.BG | Bulgarian | - | - |
Language.BN | Bengali | - | - |
Language.YUE | Cantonese | - | - |
Language.CA | Catalan | - | - |
Language.HR | Croatian | - | - |
Language.CS | Czech | - | - |
Language.DA | Danish | - | - |
Language.NL | Dutch | - | - |
Language.EN | English | en-US , en-GB , en-AU | finance |
Language.EO | Esperanto | - | - |
Language.ET | Estonian | - | - |
Language.FA | Persian | - | - |
Language.FI | Finnish | - | - |
Language.FR | French | - | - |
Language.GL | Galician | - | - |
Language.DE | German | - | - |
Language.EL | Greek | - | - |
Language.HE | Hebrew | - | - |
Language.HI | Hindi | - | - |
Language.HU | Hungarian | - | - |
Language.IA | Interlingua | - | - |
Language.IT | Italian | - | - |
Language.ID | Indonesian | - | - |
Language.GA | Irish | - | - |
Language.JA | Japanese | - | - |
Language.KO | Korean | - | - |
Language.LV | Latvian | - | - |
Language.LT | Lithuanian | - | - |
Language.MS | Malay | - | - |
Language.MT | Maltese | - | - |
Language.CMN | Mandarin | cmn-Hans , cmn-Hant | - |
cmn_en | English / Mandarin | - | - |
Language.MR | Marathi | - | - |
Language.MN | Mongolian | - | - |
Language.NO | Norwegian | - | - |
Language.PL | Polish | - | - |
Language.PT | Portuguese | - | - |
Language.RO | Romanian | - | - |
Language.RU | Russian | - | - |
Language.SK | Slovakian | - | - |
Language.SL | Slovenian | - | - |
Language.ES | Spanish | - | bilingual-en |
Language.SV | Swedish | - | - |
Language.SW | Swahili | - | - |
Language.TA | Tamil | - | - |
Language.TH | Thai | - | - |
Language.TR | Turkish | - | - |
Language.UG | Uyghur | - | - |
Language.UK | Ukrainian | - | - |
Language.UR | Urdu | - | - |
Language.VI | Vietnamese | - | - |
Language.CY | Welsh | - | - |
Translation Support
Speechmatics supports the translation of transcribed output into the following languages:
Language Code | Description |
---|---|
Language.BG | Bulgarian |
Language.CA | Catalan |
Language.CMN | Mandarin |
Language.CS | Czech |
Language.DA | Danish |
Language.DE | German |
Language.EL | Greek |
Language.EN | English |
Language.ES | Spanish |
Language.ET | Estonian |
Language.FI | Finnish |
Language.FR | French |
Language.GL | Galician |
Language.HI | Hindi |
Language.HR | Croatian |
Language.HU | Hungarian |
Language.ID | Indonesian |
Language.IT | Italian |
Language.JA | Japanese |
Language.KO | Korean |
Language.LT | Lithuanian |
Language.LV | Latvian |
Language.MS | Malay |
Language.NL | Dutch |
Language.NO | Norwegian |
Language.PL | Polish |
Language.PT | Portuguese |
Language.RO | Romanian |
Language.RU | Russian |
Language.SK | Slovakian |
Language.SL | Slovenian |
Language.SV | Swedish |
Language.TR | Turkish |
Language.UK | Ukrainian |
Language.VI | Vietnamese |
Speaker Diarization
Speechmatics STT supports speaker diarization, which separates out different speakers in the audio. The identity of each speaker is returned in the TranscriptionFrame objects in the user_id
attribute.
To enable this feature, set enable_speaker_diarization
to True
. Additionally, if a text_format
is provided, then the text output for the TranscriptionFrame will be formatted to this specification. Your system context can then be updated to include information about this format to understand which speaker spoke which words.
For example, if you have text_format
= <{speaker_id}>{text}</{speaker_id}>
, then the output would be <S1>Good morning.</S1>
.
Available attributes
Attribute | Description | Example |
---|---|---|
speaker_id | The ID of the speaker | S1 |
text | The transcribed text | Good morning. |
Usage Example
Additional Notes
- Connection Management: Automatically handles WebSocket connections and reconnections
- Sample Rate: The default sample rate of
16000
inpcm_s16le
format - VAD Integration: Supports Speechmatics’ built-in VAD and end of utterance detection