Speechmatics

Overview

SpeechmaticsSTTService enables real-time speech transcription using Speechmatics’ WebSocket API with partial + final results, speaker diarization, and end of utterance detection (VAD).

API Reference

Complete API documentation and method details

Speechmatics Docs

Official Speechmatics documentation and features

Speaker Diarization

Separating out different speakers in the audio

Example Code

Working example with interruption handling

Installation

To use SpeechmaticsSTTService, install the required dependencies:

pip install "pipecat-ai[speechmatics]"

You’ll also need to set up your Speechmatics API key as an environment variable: SPEECHMATICS_API_KEY.

Get your API key from the Speechmatics Portal.

Frames

Input

InputAudioRawFrame - Raw PCM audio data (16-bit, 16kHz, mono)
STTUpdateSettingsFrame - Runtime transcription configuration updates
STTMuteFrame - Mute audio input for transcription

Output

InterimTranscriptionFrame - Real-time transcription updates
TranscriptionFrame - Final transcription results
ErrorFrame - Connection or processing errors

Endpoints

Speechmatics STT supports the following endpoints (defaults to EU2):

Region	Environment	STT Endpoint
EU	EU1	`wss://eu1.rt.speechmatics.com/`
EU	EU2	`wss://eu2.rt.speechmatics.com/`
US	US1	`wss://us1.rt.speechmatics.com/`

Feature Discovery

To check the languages and features supported by Speechmatics STT, you can use the following code:

curl "https://eu2.rt.speechmatics.com/v1/discovery/features"

Language Support

Refer to the Speechmatics docs for more information on supported languages.

Speechmatics STT supports the following languages and regional variants. Setting a language can be done using the language parameter when creating the STT object. The exception to this is English / Mandarin which has the code cmn_en and must be set using the language_code parameter.

Language Code	Description	Locales	Domain Options
`Language.AR`	Arabic	-	-
`Language.BA`	Bashkir	-	-
`Language.EU`	Basque	-	-
`Language.BE`	Belarusian	-	-
`Language.BG`	Bulgarian	-	-
`Language.BN`	Bengali	-	-
`Language.YUE`	Cantonese	-	-
`Language.CA`	Catalan	-	-
`Language.HR`	Croatian	-	-
`Language.CS`	Czech	-	-
`Language.DA`	Danish	-	-
`Language.NL`	Dutch	-	-
`Language.EN`	English	`en-US`, `en-GB`, `en-AU`	`finance`
`Language.EO`	Esperanto	-	-
`Language.ET`	Estonian	-	-
`Language.FA`	Persian	-	-
`Language.FI`	Finnish	-	-
`Language.FR`	French	-	-
`Language.GL`	Galician	-	-
`Language.DE`	German	-	-
`Language.EL`	Greek	-	-
`Language.HE`	Hebrew	-	-
`Language.HI`	Hindi	-	-
`Language.HU`	Hungarian	-	-
`Language.IA`	Interlingua	-	-
`Language.IT`	Italian	-	-
`Language.ID`	Indonesian	-	-
`Language.GA`	Irish	-	-
`Language.JA`	Japanese	-	-
`Language.KO`	Korean	-	-
`Language.LV`	Latvian	-	-
`Language.LT`	Lithuanian	-	-
`Language.MS`	Malay	-	-
`Language.MT`	Maltese	-	-
`Language.CMN`	Mandarin	`cmn-Hans`, `cmn-Hant`	-
`cmn_en`	English / Mandarin	-	-
`Language.MR`	Marathi	-	-
`Language.MN`	Mongolian	-	-
`Language.NO`	Norwegian	-	-
`Language.PL`	Polish	-	-
`Language.PT`	Portuguese	-	-
`Language.RO`	Romanian	-	-
`Language.RU`	Russian	-	-
`Language.SK`	Slovakian	-	-
`Language.SL`	Slovenian	-	-
`Language.ES`	Spanish	-	`bilingual-en`
`Language.SV`	Swedish	-	-
`Language.SW`	Swahili	-	-
`Language.TA`	Tamil	-	-
`Language.TH`	Thai	-	-
`Language.TR`	Turkish	-	-
`Language.UG`	Uyghur	-	-
`Language.UK`	Ukrainian	-	-
`Language.UR`	Urdu	-	-
`Language.VI`	Vietnamese	-	-
`Language.CY`	Welsh	-	-

Translation Support

Speechmatics supports the translation of transcribed output into the following languages:

Language Code	Description
`Language.BG`	Bulgarian
`Language.CA`	Catalan
`Language.CMN`	Mandarin
`Language.CS`	Czech
`Language.DA`	Danish
`Language.DE`	German
`Language.EL`	Greek
`Language.EN`	English
`Language.ES`	Spanish
`Language.ET`	Estonian
`Language.FI`	Finnish
`Language.FR`	French
`Language.GL`	Galician
`Language.HI`	Hindi
`Language.HR`	Croatian
`Language.HU`	Hungarian
`Language.ID`	Indonesian
`Language.IT`	Italian
`Language.JA`	Japanese
`Language.KO`	Korean
`Language.LT`	Lithuanian
`Language.LV`	Latvian
`Language.MS`	Malay
`Language.NL`	Dutch
`Language.NO`	Norwegian
`Language.PL`	Polish
`Language.PT`	Portuguese
`Language.RO`	Romanian
`Language.RU`	Russian
`Language.SK`	Slovakian
`Language.SL`	Slovenian
`Language.SV`	Swedish
`Language.TR`	Turkish
`Language.UK`	Ukrainian
`Language.VI`	Vietnamese

Speaker Diarization

Speechmatics STT supports speaker diarization, which separates out different speakers in the audio. The identity of each speaker is returned in the TranscriptionFrame objects in the user_id attribute. To enable this feature, set enable_speaker_diarization to True. Additionally, if a text_format is provided, then the text output for the TranscriptionFrame will be formatted to this specification. Your system context can then be updated to include information about this format to understand which speaker spoke which words. For example, if you have text_format = <{speaker_id}>{text}</{speaker_id}>, then the output would be <S1>Good morning.</S1>.

Available attributes

Attribute	Description	Example
`speaker_id`	The ID of the speaker	`S1`
`text`	The transcribed text	`Good morning.`

Usage Example

Basic Configuration

Initialize the SpeechmaticsSTTService and use it in a pipeline:

from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
from pipecat.transcriptions.language import Language

# Configure service
stt = SpeechmaticsSTTService(
    api_key="your-api-key",
    language=Language.FR,
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

Dynamic Configuration

Make settings updates by pushing an STTUpdateSettingsFrame for the SpeechmaticsSTTService:

from pipecat.frames.frames import STTUpdateSettingsFrame

await task.queue_frame(STTUpdateSettingsFrame(
    language=Language.FR,
  )
)

Metrics

Both services provide comprehensive metrics:

Time to First Byte (TTFB) - Latency from audio input to first transcription
Processing Duration - Total time spent processing audio segments

Learn how to enable Metrics in your Pipeline.

Additional Notes

Connection Management: Automatically handles WebSocket connections and reconnections
Sample Rate: The default sample rate of 16000 in pcm_s16le format
VAD Integration: Supports Speechmatics’ built-in VAD and end of utterance detection

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

API Reference

Speechmatics Docs

Speaker Diarization

Example Code

Installation

Frames

Input

Output

Endpoints

Feature Discovery

Language Support

Translation Support

Speaker Diarization

Available attributes

Usage Example

Basic Configuration

Dynamic Configuration

Metrics

Additional Notes

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

API Reference

Speechmatics Docs

Speaker Diarization

Example Code

​Installation

​Frames

​Input

​Output

​Endpoints

​Feature Discovery

​Language Support

​Translation Support

​Speaker Diarization

​Available attributes

​Usage Example

​Basic Configuration

​Dynamic Configuration

​Metrics

​Additional Notes

Overview

Installation

Frames

Input

Output

Endpoints

Feature Discovery

Language Support

Translation Support

Speaker Diarization

Available attributes

Usage Example

Basic Configuration

Dynamic Configuration

Metrics

Additional Notes