> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Speechmatics

> Speech-to-text service implementation using Speechmatics' real-time transcription STT API

## Overview

`SpeechmaticsSTTService` enables real-time speech transcription using Speechmatics' WebSocket API with partial and final results, speaker diarization, and end of utterance detection (VAD) for comprehensive conversation analysis.

<Note>
  Since Speechmatics provides its own user turn start and end detection, you
  should use `ExternalUserTurnStrategies` to let Speechmatics handle turn
  management. See [User Turn
  Strategies](/api-reference/server/utilities/turn-management/user-turn-strategies)
  for configuration details. A VAD in the transport (such as `SileroVADAnalyzer`)
  is optional when Speechmatics drives turn detection via the default
  `TurnDetectionMode.EXTERNAL` mode; include it if you want useful STT metrics.
</Note>

<CardGroup cols={2}>
  <Card title="Speechmatics STT API Reference" icon="code" href="https://reference-server.pipecat.ai/en/latest/api/pipecat.services.speechmatics.stt.html">
    Pipecat's API methods for Speechmatics STT integration
  </Card>

  <Card title="Example Implementation" icon="play" href="https://github.com/pipecat-ai/pipecat/blob/main/examples/voice/voice-speechmatics.py">
    Complete example with interruption handling
  </Card>

  <Card title="Speechmatics Documentation" icon="book" href="https://docs.speechmatics.com/rt-api-ref">
    Official Speechmatics documentation and features
  </Card>

  <Card title="Speaker Diarization Guide" icon="microphone" href="https://docs.speechmatics.com/speech-to-text/features/diarization#speaker-diarization">
    Learn about separating different speakers in audio
  </Card>
</CardGroup>

## Installation

To use Speechmatics services, install the required dependencies:

```bash theme={null}
uv add "pipecat-ai[speechmatics]"
```

## Prerequisites

### Speechmatics Account Setup

Before using Speechmatics STT services, you need:

1. **Speechmatics Account**: Sign up at [Speechmatics](https://www.speechmatics.com/)
2. **API Key**: Generate an API key from your account dashboard
3. **Feature Selection**: Configure transcription features like speaker diarization

### Select Endpoint

Speechmatics STT supports the following endpoints (defaults to `EU2`):

| Region | Environment   | STT Endpoint                     | Access                    |
| ------ | ------------- | -------------------------------- | ------------------------- |
| EU     | EU1           | `wss://neu.rt.speechmatics.com/` | Self-Service / Enterprise |
| EU     | EU2 (Default) | `wss://eu2.rt.speechmatics.com/` | Self-Service / Enterprise |
| US     | US1           | `wss://wus.rt.speechmatics.com/` | Enterprise                |

### Required Environment Variables

* `SPEECHMATICS_API_KEY`: Your Speechmatics API key for authentication
* `SPEECHMATICS_RT_URL`: Speechmatics endpoint URL (optional, defaults to EU2)

## Configuration

### SpeechmaticsSTTService

<ParamField path="api_key" type="str" default="None">
  Speechmatics API key. Falls back to the `SPEECHMATICS_API_KEY` environment
  variable.
</ParamField>

<ParamField path="base_url" type="str" default="None">
  Base URL for the Speechmatics API. Falls back to `SPEECHMATICS_RT_URL`
  environment variable, then defaults to `wss://eu2.rt.speechmatics.com/v2`.
</ParamField>

<ParamField path="sample_rate" type="int" default="None">
  Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
  rate.
</ParamField>

<ParamField path="encoding" type="AudioEncoding" default="AudioEncoding.PCM_S16LE">
  Audio encoding format. Init-only -- not part of runtime-updatable settings.
</ParamField>

<ParamField path="params" type="SpeechmaticsSTTService.InputParams" default="None" deprecated>
  Additional configuration parameters. *Deprecated in v0.0.105. Use
  `settings=SpeechmaticsSTTService.Settings(...)` instead.*
</ParamField>

<ParamField path="settings" type="SpeechmaticsSTTService.Settings" default="None">
  Runtime-configurable settings for the STT service. See [Settings](#settings)
  below.
</ParamField>

<ParamField path="should_interrupt" type="bool" default="True">
  Whether to interrupt bot output when Speechmatics detects user speech. Only
  applies when `turn_detection_mode` is set to detect speech (ADAPTIVE or
  SMART\_TURN).
</ParamField>

<ParamField path="ttfs_p99_latency" type="float" default="0.74">
  P99 latency from speech end to final transcript in seconds. Override for your
  deployment. See [stt-benchmark](https://github.com/pipecat-ai/stt-benchmark).
</ParamField>

### Settings

Runtime-configurable settings passed via the `settings` constructor argument using `SpeechmaticsSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.

| Parameter                          | Type                         | Default       | Description                                                                                                                                                  |
| ---------------------------------- | ---------------------------- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `model`                            | `str`                        | `None`        | STT model identifier. *(Inherited from base STT settings.)*                                                                                                  |
| `language`                         | `Language \| str`            | `Language.EN` | Language code for transcription. *(Inherited from base STT settings.)*                                                                                       |
| `domain`                           | `str`                        | `None`        | Domain for Speechmatics API (e.g. for bilingual transcription).                                                                                              |
| `turn_detection_mode`              | `TurnDetectionMode`          | `EXTERNAL`    | Endpoint handling mode. `EXTERNAL` (default) uses Pipecat's VAD, `ADAPTIVE` uses Speechmatics' VAD, `SMART_TURN` uses Speechmatics' ML-based turn detection. |
| `speaker_active_format`            | `str`                        | `"{text}"`    | Formatter for active speaker output. Available attributes: `{speaker_id}`, `{text}`. Example: `"@{speaker_id}: {text}"`.                                     |
| `speaker_passive_format`           | `str`                        | `"{text}"`    | Formatter for passive/background speaker output. Same attributes as active format.                                                                           |
| `focus_speakers`                   | `list[str]`                  | `[]`          | Speaker IDs to focus on. Only these speakers drive end of turn and conversation flow.                                                                        |
| `ignore_speakers`                  | `list[str]`                  | `[]`          | Speaker IDs to exclude from transcription entirely.                                                                                                          |
| `focus_mode`                       | `SpeakerFocusMode`           | `RETAIN`      | `RETAIN` keeps words from non-focused speakers; `IGNORE` drops them.                                                                                         |
| `known_speakers`                   | `list[SpeakerIdentifier]`    | `[]`          | Known speaker labels and identifiers for speaker attribution.                                                                                                |
| `additional_vocab`                 | `list[AdditionalVocabEntry]` | `[]`          | Additional vocabulary to boost recognition of specific words.                                                                                                |
| `operating_point`                  | `OperatingPoint`             | `None`        | Transcription accuracy vs. latency tradeoff. `ENHANCED` recommended for most use cases.                                                                      |
| `max_delay`                        | `float`                      | `None`        | Maximum delay in seconds for transcription. Lower values reduce latency but may impact accuracy.                                                             |
| `end_of_utterance_silence_trigger` | `float`                      | `None`        | Silence duration in seconds to trigger end of utterance. Must be lower than `max_delay`.                                                                     |
| `end_of_utterance_max_delay`       | `float`                      | `None`        | Maximum delay for end of utterance. Must be greater than `end_of_utterance_silence_trigger`.                                                                 |
| `punctuation_overrides`            | `dict`                       | `None`        | Custom punctuation overrides for the STT engine.                                                                                                             |
| `include_partials`                 | `bool`                       | `None`        | Include partial word fragments in partial segment output.                                                                                                    |
| `split_sentences`                  | `bool`                       | `None`        | Emit finalized sentences mid-turn as they are completed.                                                                                                     |
| `enable_diarization`               | `bool`                       | `None`        | Enable speaker diarization to attribute words to unique speakers.                                                                                            |
| `speaker_sensitivity`              | `float`                      | `None`        | Diarization sensitivity. Higher values help distinguish similar voices.                                                                                      |
| `max_speakers`                     | `int`                        | `None`        | Maximum number of speakers to detect. Only use when the speaker count is known.                                                                              |
| `prefer_current_speaker`           | `bool`                       | `None`        | Give extra weight to grouping nearby words as the same speaker.                                                                                              |
| `extra_params`                     | `dict`                       | `None`        | Additional parameters passed to the STT engine.                                                                                                              |

## End of Turn detection

The Speechmatics STT service supports Pipecat's own end of turn detection (Silero VAD and Smart Turn) without any additional configuration. When using Pipecat's features, the `turn_detection_mode` must be set to `TurnDetectionMode.EXTERNAL` (which is the default).

### Default mode

By default, Speechmatics uses signals from Pipecat's VAD / smart turn detection as input to trigger the end of turn and finalization of the current transcript segment. This provides a seamless integration where Pipecat's voice activity detection and turn detection work in conjunction with Speechmatics' real-time processing capabilities.

<Note>
  If you wish to use features such as focussing on or ignoring other speakers,
  then you may see benefit from using `TurnDetectionMode.ADAPTIVE` or
  `TurnDetectionMode.SMART_TURN` modes.
</Note>

### Adaptive End of Turn detection

This mode looks at the content of the speech, pace of speaking and other acoustic information (using VAD) to determine when the user has finished speaking. This is especially important when using the plugin's ability to focus on a specific speaker and not have other speakers interrupt the agent / conversation.

To use this mode, set the `turn_detection_mode` to `TurnDetectionMode.ADAPTIVE` in your STT configuration. You must also remove any other VAD / smart turn features within Pipecat to ensure that there is not a conflict.

```python theme={null}
transport_params = TransportParams(
    audio_in_enabled=True,
    audio_out_enabled=True,
    # vad_analyzer=... <- REMOVE (use Speechmatics' built-in VAD)
    # turn_analyzer=... <- REMOVE (use Speechmatics' built-in end-of-turn detection)
)

...

stt = SpeechmaticsSTTService(
    api_key=os.getenv("SPEECHMATICS_API_KEY"),
    settings=SpeechmaticsSTTService.Settings(
        language=Language.EN,
        turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
        speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
    ),
)
```

### Smart Turn detection

Further to `ADAPTIVE`, Speechmatics also provides its own smart turn detection which combines VAD and the use of Smart Turn v3 from Pipecat. This can be enabled by setting the `turn_detection_mode` parameter to `TurnDetectionMode.SMART_TURN`.

```python theme={null}
transport_params = TransportParams(
    audio_in_enabled=True,
    audio_out_enabled=True,
    # vad_analyzer=... <- REMOVE (use Speechmatics' built-in VAD)
    # turn_analyzer=... <- REMOVE (use Speechmatics' built-in end-of-turn detection)
)

...

stt = SpeechmaticsSTTService(
    api_key=os.getenv("SPEECHMATICS_API_KEY"),
    settings=SpeechmaticsSTTService.Settings(
        language=Language.EN,
        turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.SMART_TURN,
        speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
    ),
)
```

## Speaker Diarization

Speechmatics STT supports speaker diarization, which separates out different speakers in the audio. The identity of each speaker is returned in the TranscriptionFrame objects in the `user_id` attribute.

If `speaker_active_format` or `speaker_passive_format` are provided, then the text output for the TranscriptionFrame will be formatted to this specification. Your system context can then be updated to include information about this format to understand which speaker spoke which words. The passive format is optional and is used when the engine has been told to focus on specific speakers and other speakers will then be formatted using the `speaker_passive_format` format.

* `speaker_active_format` -> the formatter for active speakers
* `speaker_passive_format` -> the formatter for passive / background speakers

Examples:

* `<{speaker_id}>{text}</{speaker_id}>` -> `<S1>Good morning.</S1>`.
* `@{speaker_id}: {text}` -> `@S1: Good morning.`.

### Available attributes

| Attribute          | Description                                 | Example                         |
| ------------------ | ------------------------------------------- | ------------------------------- |
| `speaker_id`       | The label of the speaker                    | `S1`                            |
| `text` / `content` | The transcribed text                        | `Good morning.`                 |
| `ts`               | The timestamp of the transcription          | `2025-09-15T19:47:29.096+00:00` |
| `start_time`       | The start time of the transcription segment | `0.0`                           |
| `end_time`         | The end time of the transcription segment   | `2.5`                           |
| `lang`             | The language of the transcription segment   | `en`                            |

## Speaker Lock

In conjunction with speaker diarization, it is possible to decide at the start or during a conversation to focus on a specific speaker, ignore or retain words from other speakers, or implicitly ignore one or more speakers altogether.

In the example below, the following will happen:

* `S1` will be transcribed as normal and drive the end of turn and the conversation flow
* `S2` will be ignored completely
* All other speakers' words will be transcribed and emitted as tagged segments, but ONLY when a speaker in focus also speaks

What this means is that if `S3` says "Hello", then it is not until `S1` speaks again that the transcription will be emitted.

```python theme={null}
stt = SpeechmaticsSTTService(
    api_key=os.getenv("SPEECHMATICS_API_KEY"),
    settings=SpeechmaticsSTTService.Settings(
        language=Language.EN,
        focus_speakers=["S1"],
        ignore_speakers=["S2"],
        focus_mode=SpeechmaticsSTTService.SpeakerFocusMode.RETAIN,
        speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
    ),
)
```

## Language Support

<Note>
  Refer to the [Speechmatics
  docs](https://docs.speechmatics.com/introduction/supported-languages) for more
  information on supported languages.
</Note>

Speechmatics STT supports the following languages and regional variants.

Setting a language can be done using the `language` parameter when creating the STT object. The exception to this is English / Mandarin which has the code `cmn_en`.

| Language Code  | Description | Locales                   |
| -------------- | ----------- | ------------------------- |
| `Language.AR`  | Arabic      | -                         |
| `Language.BA`  | Bashkir     | -                         |
| `Language.EU`  | Basque      | -                         |
| `Language.BE`  | Belarusian  | -                         |
| `Language.BG`  | Bulgarian   | -                         |
| `Language.BN`  | Bengali     | -                         |
| `Language.YUE` | Cantonese   | -                         |
| `Language.CA`  | Catalan     | -                         |
| `Language.HR`  | Croatian    | -                         |
| `Language.CS`  | Czech       | -                         |
| `Language.DA`  | Danish      | -                         |
| `Language.NL`  | Dutch       | -                         |
| `Language.EN`  | English     | `en-US`, `en-GB`, `en-AU` |
| `Language.EO`  | Esperanto   | -                         |
| `Language.ET`  | Estonian    | -                         |
| `Language.FA`  | Persian     | -                         |
| `Language.FI`  | Finnish     | -                         |
| `Language.FR`  | French      | -                         |
| `Language.GL`  | Galician    | -                         |
| `Language.DE`  | German      | -                         |
| `Language.EL`  | Greek       | -                         |
| `Language.HE`  | Hebrew      | -                         |
| `Language.HI`  | Hindi       | -                         |
| `Language.HU`  | Hungarian   | -                         |
| `Language.IA`  | Interlingua | -                         |
| `Language.IT`  | Italian     | -                         |
| `Language.ID`  | Indonesian  | -                         |
| `Language.GA`  | Irish       | -                         |
| `Language.JA`  | Japanese    | -                         |
| `Language.KO`  | Korean      | -                         |
| `Language.LV`  | Latvian     | -                         |
| `Language.LT`  | Lithuanian  | -                         |
| `Language.MS`  | Malay       | -                         |
| `Language.MT`  | Maltese     | -                         |
| `Language.CMN` | Mandarin    | `cmn-Hans`, `cmn-Hant`    |
| `Language.MR`  | Marathi     | -                         |
| `Language.MN`  | Mongolian   | -                         |
| `Language.NO`  | Norwegian   | -                         |
| `Language.PL`  | Polish      | -                         |
| `Language.PT`  | Portuguese  | -                         |
| `Language.RO`  | Romanian    | -                         |
| `Language.RU`  | Russian     | -                         |
| `Language.SK`  | Slovakian   | -                         |
| `Language.SL`  | Slovenian   | -                         |
| `Language.ES`  | Spanish     | -                         |
| `Language.SV`  | Swedish     | -                         |
| `Language.SW`  | Swahili     | -                         |
| `Language.TA`  | Tamil       | -                         |
| `Language.TH`  | Thai        | -                         |
| `Language.TR`  | Turkish     | -                         |
| `Language.UG`  | Uyghur      | -                         |
| `Language.UK`  | Ukrainian   | -                         |
| `Language.UR`  | Urdu        | -                         |
| `Language.VI`  | Vietnamese  | -                         |
| `Language.CY`  | Welsh       | -                         |

For bilingual transcription, use the `language` and `domain` parameters as follows:

| Language Code | Description        | Domain Options |
| ------------- | ------------------ | -------------- |
| `cmn_en`      | English / Mandarin | -              |
| `en_ms`       | English / Malay    | -              |
| `Language.ES` | English / Spanish  | `bilingual-en` |
| `en_ta`       | English / Tamil    | -              |

## Usage Examples

Examples are included in the Pipecat project:

* Using Speechmatics STT service -> [07a-interruptible-speechmatics.py](https://github.com/pipecat-ai/pipecat/blob/main/examples/voice/voice-speechmatics.py)
* Using Speechmatics STT service with VAD -> [07a-interruptible-speechmatics-vad.py](https://github.com/pipecat-ai/pipecat/blob/main/examples/voice/voice-speechmatics-vad.py)
* Transcribing with Speechmatics STT -> [13h-speechmatics-transcription.py](https://github.com/pipecat-ai/pipecat/blob/main/examples/transcription/transcription-speechmatics.py)

Sample projects:

* Guess Who -> [Guess Who](https://github.com/sam-s10s/pipecat-guess-who)
* Guess Who Board Game -> [Guess Who](https://github.com/sam-s10s/pipecat-guess-who-irl)

### Basic Configuration

Initialize the `SpeechmaticsSTTService` and use it in a pipeline:

```python theme={null}
from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
from pipecat.transcriptions.language import Language

# Configure service
stt = SpeechmaticsSTTService(
    api_key="your-api-key",
    settings=SpeechmaticsSTTService.Settings(
        language=Language.FR,
    )
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])
```

### With Diarization

This will enable diarization and also only go to the LLM if words are spoken from the first speaker (`S1`). Words from other speakers are transcribed but only sent when the first speaker speaks. When using the `TurnDetectionMode.ADAPTIVE` or `TurnDetectionMode.SMART_TURN` options, this will use the speaker diarization to determine when a speaker is speaking. You will need to disable VAD options within the selected transport object to ensure this works correctly (see [07b-interruptible-speechmatics-vad.py](https://github.com/pipecat-ai/pipecat/blob/main/examples/voice/voice-speechmatics-vad.py) as an example).

Initialize the `SpeechmaticsSTTService` and use it in a pipeline:

```python theme={null}
from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
from pipecat.transcriptions.language import Language

# Configure service
stt = SpeechmaticsSTTService(
    api_key="your-api-key",
    settings=SpeechmaticsSTTService.Settings(
        language=Language.EN,
        turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
        focus_speakers=["S1"],
        speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
        speaker_passive_format="<PASSIVE><{speaker_id}>{text}</{speaker_id}></PASSIVE>",
    )
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])
```

## Additional Notes

* **Connection Management**: Automatically handles WebSocket connections and reconnections
* **Sample Rate**: The default sample rate of `16000` in `pcm_s16le` format
* **VAD Integration**: Optionally supports Speechmatics' built-in VAD and end of utterance detection

<Tip>
  The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
  `Settings` / `settings=` instead. See the [Service Settings
  guide](/pipecat/fundamentals/service-settings) for migration details.
</Tip>

## Event Handlers

In addition to the standard [service connection events](/api-reference/server/events/service-events) (`on_connected`, `on_disconnected`, `on_connection_error`), Speechmatics provides:

| Event                | Description                            |
| -------------------- | -------------------------------------- |
| `on_speakers_result` | Speaker identification result received |

```python theme={null}
@stt.event_handler("on_speakers_result")
async def on_speakers_result(service, message):
    print(f"Speaker result: {message}")
```
