> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# xAI

> Speech-to-text service implementation using xAI's real-time WebSocket API

## Overview

`XAISTTService` provides real-time speech-to-text transcription using xAI's WebSocket STT API with support for interim results, configurable endpointing, multichannel audio, and speaker diarization.

The service streams raw audio (PCM, μ-law, or A-law) to xAI's endpoint and emits interim and final transcription frames based on the server's `is_final` and `speech_final` flags. The connection is persistent: audio is streamed continuously and the server automatically detects utterance boundaries.

<CardGroup cols={2}>
  <Card title="xAI STT API Reference" icon="code" href="https://reference-server.pipecat.ai/en/latest/api/pipecat.services.xai.stt.html">
    Pipecat's API methods for xAI STT integration
  </Card>

  <Card title="Example Implementation" icon="play" href="https://github.com/pipecat-ai/pipecat/blob/main/examples/transcription/transcription-xai.py">
    Complete transcription example with xAI STT
  </Card>

  <Card title="Voice Agent Example" icon="microphone" href="https://github.com/pipecat-ai/pipecat/blob/main/examples/voice/voice-xai.py">
    Full voice agent with xAI STT, LLM, and TTS
  </Card>

  <Card title="xAI Documentation" icon="book" href="https://docs.x.ai/developers/rest-api-reference/inference/voice">
    Official xAI voice API documentation
  </Card>
</CardGroup>

## Installation

To use xAI STT services, install the required dependencies:

```bash theme={null}
uv add "pipecat-ai[xai]"
```

## Prerequisites

### xAI Account Setup

Before using xAI STT services, you need:

1. **xAI Account**: Sign up at [xAI](https://x.ai/)
2. **API Key**: Generate an API key from your account dashboard
3. **Language Selection**: Choose from 16 supported languages

### Required Environment Variables

* `XAI_API_KEY`: Your xAI API key for authentication

## Configuration

### XAISTTService

<ParamField path="api_key" type="str" required>
  xAI API key for authentication (used as Bearer token for the WebSocket
  handshake).
</ParamField>

<ParamField path="ws_url" type="str" default="wss://api.x.ai/v1/stt">
  WebSocket endpoint URL for xAI STT.
</ParamField>

<ParamField path="sample_rate" type="int" default="16000">
  Audio sample rate in Hz. Supported values: 8000, 16000, 22050, 24000, 44100,
  48000\.
</ParamField>

<ParamField path="encoding" type="str" default="pcm">
  Audio encoding format. One of `"pcm"` (signed 16-bit LE), `"mulaw"`, or
  `"alaw"`.
</ParamField>

<ParamField path="settings" type="XAISTTService.Settings" default="None">
  Runtime-configurable settings for the STT service. See [Settings](#settings)
  below.
</ParamField>

<ParamField path="ttfs_p99_latency" type="float" default="XAI_TTFS_P99">
  P99 latency from speech end to final transcript in seconds. Override for your
  deployment. See
  [stt-benchmark](https://github.com/pipecat-ai/stt-benchmark).
</ParamField>

### Settings

Runtime-configurable settings passed via the `settings` constructor argument using `XAISTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.

| Parameter         | Type              | Default       | Description                                                                                                                           |
| ----------------- | ----------------- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| `model`           | `str`             | `None`        | Not applicable for xAI STT. *(Inherited from base STT settings.)*                                                                     |
| `language`        | `Language \| str` | `Language.EN` | Recognition language. Supports: AR, BN, DE, EN, ES, FR, HI, ID, IT, JA, KO, PT, RU, TR, VI, ZH. *(Inherited from base STT settings.)* |
| `interim_results` | `bool`            | `True`        | When True, partial transcripts are emitted approximately every 500ms.                                                                 |
| `endpointing`     | `int \| None`     | `None`        | Silence duration in milliseconds that triggers a speech-final event. Range 0-5000. Server default is 10ms.                            |
| `multichannel`    | `bool \| None`    | `None`        | When True, transcribes each interleaved channel independently. Requires `channels` >= 2.                                              |
| `channels`        | `int \| None`     | `None`        | Number of interleaved channels (2-8). Required when `multichannel` is True.                                                           |
| `diarize`         | `bool \| None`    | `None`        | When True, the server attaches a `speaker` field to each word identifying the detected speaker.                                       |

## Usage

### Basic Setup

```python theme={null}
import os
from pipecat.services.xai.stt import XAISTTService

stt = XAISTTService(
    api_key=os.getenv("XAI_API_KEY"),
)
```

### With Custom Settings

```python theme={null}
import os
from pipecat.services.xai.stt import XAISTTService
from pipecat.transcriptions.language import Language

stt = XAISTTService(
    api_key=os.getenv("XAI_API_KEY"),
    sample_rate=24000,
    settings=XAISTTService.Settings(
        language=Language.ES,
        interim_results=True,
        endpointing=1000,
        diarize=True,
    ),
)
```

### With Multichannel Audio

```python theme={null}
import os
from pipecat.services.xai.stt import XAISTTService

stt = XAISTTService(
    api_key=os.getenv("XAI_API_KEY"),
    settings=XAISTTService.Settings(
        multichannel=True,
        channels=2,
    ),
)
```

## Notes

* **Connection management**: The WebSocket connection is persistent and automatically reconnects if it drops mid-session. Audio is streamed continuously and the server emits `transcript.partial` events with `is_final` and `speech_final` flags to mark utterance boundaries.
* **Language support**: xAI STT accepts two-letter language codes. When set, the server applies Inverse Text Normalization for improved accuracy.
* **Audio encoding**: Supports PCM (signed 16-bit LE), μ-law, and A-law encoding formats. PCM is recommended for best quality.
* **Settings updates**: Changing settings requires reconnecting to the WebSocket. The service automatically handles disconnect and reconnect when settings are updated via `STTUpdateSettingsFrame`.

## Event Handlers

xAI STT supports the standard [service connection events](/api-reference/server/events/service-events):

| Event             | Description                     |
| ----------------- | ------------------------------- |
| `on_connected`    | Connected to xAI WebSocket      |
| `on_disconnected` | Disconnected from xAI WebSocket |

```python theme={null}
@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to xAI STT")

@stt.event_handler("on_disconnected")
async def on_disconnected(service):
    print("Disconnected from xAI STT")
```
