Overview

SpeechmaticsSTTService enables real-time speech transcription using Speechmatics’ WebSocket API with partial + final results, speaker diarization, and end of utterance detection (VAD).

Installation

To use SpeechmaticsSTTService, install the required dependencies:

pip install "pipecat-ai[speechmatics]"

You’ll also need to set up your Speechmatics API key as an environment variable: SPEECHMATICS_API_KEY.

Get your API key from the Speechmatics Portal.

Frames

Input

  • InputAudioRawFrame - Raw PCM audio data (16-bit, 16kHz, mono)
  • STTUpdateSettingsFrame - Runtime transcription configuration updates
  • STTMuteFrame - Mute audio input for transcription

Output

  • InterimTranscriptionFrame - Real-time transcription updates
  • TranscriptionFrame - Final transcription results
  • ErrorFrame - Connection or processing errors

Endpoints

Speechmatics STT supports the following endpoints (defaults to EU2):

RegionEnvironmentSTT Endpoint
EUEU1wss://eu1.rt.speechmatics.com/
EUEU2wss://eu2.rt.speechmatics.com/
USUS1wss://us1.rt.speechmatics.com/

Feature Discovery

To check the languages and features supported by Speechmatics STT, you can use the following code:

curl "https://eu2.rt.speechmatics.com/v1/discovery/features"

Language Support

Refer to the Speechmatics docs for more information on supported languages.

Speechmatics STT supports the following languages and regional variants.

Setting a language can be done using the language parameter when creating the STT object. The exception to this is English / Mandarin which has the code cmn_en and must be set using the language_code parameter.

Language CodeDescriptionLocalesDomain Options
Language.ARArabic--
Language.BABashkir--
Language.EUBasque--
Language.BEBelarusian--
Language.BGBulgarian--
Language.BNBengali--
Language.YUECantonese--
Language.CACatalan--
Language.HRCroatian--
Language.CSCzech--
Language.DADanish--
Language.NLDutch--
Language.ENEnglishen-US, en-GB, en-AUfinance
Language.EOEsperanto--
Language.ETEstonian--
Language.FAPersian--
Language.FIFinnish--
Language.FRFrench--
Language.GLGalician--
Language.DEGerman--
Language.ELGreek--
Language.HEHebrew--
Language.HIHindi--
Language.HUHungarian--
Language.IAInterlingua--
Language.ITItalian--
Language.IDIndonesian--
Language.GAIrish--
Language.JAJapanese--
Language.KOKorean--
Language.LVLatvian--
Language.LTLithuanian--
Language.MSMalay--
Language.MTMaltese--
Language.CMNMandarincmn-Hans, cmn-Hant-
cmn_enEnglish / Mandarin--
Language.MRMarathi--
Language.MNMongolian--
Language.NONorwegian--
Language.PLPolish--
Language.PTPortuguese--
Language.RORomanian--
Language.RURussian--
Language.SKSlovakian--
Language.SLSlovenian--
Language.ESSpanish-bilingual-en
Language.SVSwedish--
Language.SWSwahili--
Language.TATamil--
Language.THThai--
Language.TRTurkish--
Language.UGUyghur--
Language.UKUkrainian--
Language.URUrdu--
Language.VIVietnamese--
Language.CYWelsh--

Translation Support

Speechmatics supports the translation of transcribed output into the following languages:

Language CodeDescription
Language.BGBulgarian
Language.CACatalan
Language.CMNMandarin
Language.CSCzech
Language.DADanish
Language.DEGerman
Language.ELGreek
Language.ENEnglish
Language.ESSpanish
Language.ETEstonian
Language.FIFinnish
Language.FRFrench
Language.GLGalician
Language.HIHindi
Language.HRCroatian
Language.HUHungarian
Language.IDIndonesian
Language.ITItalian
Language.JAJapanese
Language.KOKorean
Language.LTLithuanian
Language.LVLatvian
Language.MSMalay
Language.NLDutch
Language.NONorwegian
Language.PLPolish
Language.PTPortuguese
Language.RORomanian
Language.RURussian
Language.SKSlovakian
Language.SLSlovenian
Language.SVSwedish
Language.TRTurkish
Language.UKUkrainian
Language.VIVietnamese

Speaker Diarization

Speechmatics STT supports speaker diarization, which separates out different speakers in the audio. The identity of each speaker is returned in the TranscriptionFrame objects in the user_id attribute.

To enable this feature, set enable_speaker_diarization to True. Additionally, if a text_format is provided, then the text output for the TranscriptionFrame will be formatted to this specification. Your system context can then be updated to include information about this format to understand which speaker spoke which words.

For example, if you have text_format = <{speaker_id}>{text}</{speaker_id}>, then the output would be <S1>Good morning.</S1>.

Available attributes

AttributeDescriptionExample
speaker_idThe ID of the speakerS1
textThe transcribed textGood morning.

Usage Example

from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
from pipecat.transcriptions.language import Language

# Configure service
stt = SpeechmaticsSTTService(
    api_key="your-api-key",
    language=Language.FR,
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    llm,
    tts,
    transport.output()
])

Additional Notes

  • Connection Management: Automatically handles WebSocket connections and reconnections
  • Sample Rate: The default sample rate of 16000 in pcm_s16le format
  • VAD Integration: Supports Speechmatics’ built-in VAD and end of utterance detection