> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Gladia

> Speech-to-text service implementation using Gladia's API

## Overview

`GladiaSTTService` provides real-time speech recognition using Gladia's WebSocket API with support for 99+ languages, custom vocabulary, translation, sentiment analysis, and advanced audio processing features for comprehensive transcription.

<CardGroup cols={2}>
  <Card title="Gladia STT API Reference" icon="code" href="https://reference-server.pipecat.ai/en/latest/api/pipecat.services.gladia.html">
    Pipecat's API methods for Gladia STT integration
  </Card>

  <Card title="Example Implementation" icon="play" href="https://github.com/pipecat-ai/pipecat/blob/main/examples/voice/voice-gladia.py">
    Complete example with interruption handling
  </Card>

  <Card title="Gladia Documentation" icon="book" href="https://docs.gladia.io/api-reference/live-flow">
    Official Gladia documentation and features
  </Card>

  <Card title="Gladia Platform" icon="microphone" href="https://www.gladia.io/">
    Access multilingual transcription and API keys
  </Card>
</CardGroup>

## Installation

To use Gladia services, install the required dependency:

```bash theme={null}
uv add "pipecat-ai[gladia]"
```

## Prerequisites

### Gladia Account Setup

Before using Gladia STT services, you need:

1. **Gladia Account**: Sign up at [Gladia](https://www.gladia.io/)
2. **API Key**: Generate an API key from your account dashboard
3. **Region Selection**: Choose your preferred region (EU-West or US-West)

### Required Environment Variables

* `GLADIA_API_KEY`: Your Gladia API key for authentication
* `GLADIA_REGION`: Your preferred region (optional, defaults to "eu-west")

## Configuration

### GladiaSTTService

<ParamField path="api_key" type="str" required>
  Gladia API key for authentication.
</ParamField>

<ParamField path="region" type="Literal['us-west', 'eu-west']" default="None">
  Region used to process audio. Defaults to `"eu-west"` when `None`.
</ParamField>

<ParamField path="url" type="str" default="https://api.gladia.io/v2/live">
  Gladia API URL for session initialization.
</ParamField>

<ParamField path="encoding" type="str" default="wav/pcm">
  Audio encoding format. Init-only -- not part of runtime-updatable settings.
</ParamField>

<ParamField path="bit_depth" type="int" default="16">
  Audio bit depth. Init-only -- not part of runtime-updatable settings.
</ParamField>

<ParamField path="channels" type="int" default="1">
  Number of audio channels. Init-only -- not part of runtime-updatable settings.
</ParamField>

<ParamField path="sample_rate" type="int" default="None">
  Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
  rate.
</ParamField>

<ParamField path="model" type="str" default="None" deprecated>
  Model to use for transcription. *Deprecated in v0.0.105. Use
  `settings=GladiaSTTService.Settings(...)` instead.*
</ParamField>

<ParamField path="params" type="GladiaInputParams" default="None" deprecated>
  Additional configuration parameters. *Deprecated in v0.0.105. Use
  `settings=GladiaSTTService.Settings(...)` instead.*
</ParamField>

<ParamField path="settings" type="GladiaSTTService.Settings" default="None">
  Runtime-configurable settings for the STT service. See [Settings](#settings)
  below.
</ParamField>

<ParamField path="max_buffer_size" type="int" default="20971520">
  Maximum size of audio buffer in bytes (default 20MB).
</ParamField>

<ParamField path="should_interrupt" type="bool" default="True">
  Whether the bot should be interrupted when Gladia VAD detects user speech.
</ParamField>

<ParamField path="ttfs_p99_latency" type="float" default="1.49">
  P99 latency from speech end to final transcript in seconds. Override for your
  deployment. See [stt-benchmark](https://github.com/pipecat-ai/stt-benchmark).
</ParamField>

### Settings

Runtime-configurable settings passed via the `settings` constructor argument using `GladiaSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.

| Parameter                              | Type                       | Default | Description                                                                           |
| -------------------------------------- | -------------------------- | ------- | ------------------------------------------------------------------------------------- |
| `model`                                | `str`                      | `None`  | STT model identifier. *(Inherited from base STT settings.)*                           |
| `language`                             | `Language \| str`          | `None`  | Language for speech recognition. *(Inherited from base STT settings.)*                |
| `language_config`                      | `LanguageConfig`           | `None`  | Detailed language configuration with code switching support.                          |
| `custom_metadata`                      | `Dict[str, Any]`           | `None`  | Additional metadata to include with requests.                                         |
| `endpointing`                          | `float`                    | `None`  | Silence duration in seconds to mark end of speech.                                    |
| `maximum_duration_without_endpointing` | `int`                      | `5`     | Maximum utterance duration (seconds) without silence.                                 |
| `pre_processing`                       | `PreProcessingConfig`      | `None`  | Audio pre-processing options (audio enhancer, speech threshold).                      |
| `realtime_processing`                  | `RealtimeProcessingConfig` | `None`  | Real-time processing features (custom vocabulary, translation, NER, sentiment).       |
| `messages_config`                      | `MessagesConfig`           | `None`  | WebSocket message filtering options.                                                  |
| `enable_vad`                           | `bool`                     | `False` | Enable Gladia VAD for end-of-utterance detection. Use without other VAD in the agent. |

## Usage

### Basic Setup

```python theme={null}
from pipecat.services.gladia.stt import GladiaSTTService

stt = GladiaSTTService(
    api_key=os.getenv("GLADIA_API_KEY"),
)
```

### With Language Configuration

```python theme={null}
from pipecat.services.gladia.stt import GladiaSTTService
from pipecat.services.gladia.config import LanguageConfig

stt = GladiaSTTService(
    api_key=os.getenv("GLADIA_API_KEY"),
    region="us-west",
    settings=GladiaSTTService.Settings(
        model="solaria-1",
        language_config=LanguageConfig(
            languages=["en", "es"],
            code_switching=True,
        ),
    ),
)
```

### With Real-time Processing

```python theme={null}
from pipecat.services.gladia.stt import GladiaSTTService
from pipecat.services.gladia.config import (
    RealtimeProcessingConfig,
    CustomVocabularyConfig,
    CustomVocabularyItem,
    TranslationConfig,
)

stt = GladiaSTTService(
    api_key=os.getenv("GLADIA_API_KEY"),
    settings=GladiaSTTService.Settings(
        realtime_processing=RealtimeProcessingConfig(
            custom_vocabulary=True,
            custom_vocabulary_config=CustomVocabularyConfig(
                vocabulary=[
                    CustomVocabularyItem(value="Pipecat", intensity=0.8),
                    "Gladia",
                ],
            ),
            translation=True,
            translation_config=TranslationConfig(
                target_languages=["fr", "de"],
                model="enhanced",
            ),
        ),
    ),
)
```

## Notes

* **Session-based connection**: Gladia uses a two-step connection process: first an HTTP POST to initialize a session, then a WebSocket connection to the returned session URL. The session URL and ID are managed automatically.
* **Audio buffering**: The service buffers audio data locally and sends it when connected. If the connection drops and reconnects, buffered audio is automatically re-sent to minimize transcript gaps.
* **Keepalive**: Empty audio chunks are sent periodically to keep the Gladia connection alive (keepalive interval: 5s, timeout: 20s).
* **Built-in VAD**: Set `enable_vad=True` in Settings to use Gladia's server-side VAD, which emits `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame`. When using this, do not enable another VAD in your pipeline.
* **Translation**: Gladia supports real-time translation to multiple target languages. Translation results are pushed as `TranslationFrame`s.

<Tip>
  The `GladiaInputParams` / `params=` pattern is deprecated as of v0.0.105. Use
  `Settings` / `settings=` instead. See the [Service Settings
  guide](/pipecat/fundamentals/service-settings) for migration details.
</Tip>

## Event Handlers

Gladia STT supports the standard [service connection events](/api-reference/server/events/service-events):

| Event             | Description                        |
| ----------------- | ---------------------------------- |
| `on_connected`    | Connected to Gladia WebSocket      |
| `on_disconnected` | Disconnected from Gladia WebSocket |

```python theme={null}
@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to Gladia")
```
