> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Grok Realtime

> Real-time speech-to-speech service implementation using xAI's Grok Voice Agent API

## Overview

`GrokRealtimeLLMService` provides real-time, multimodal conversation capabilities using xAI's Grok Voice Agent API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management with low-latency response times.

<CardGroup cols={2}>
  <Card title="Grok Realtime API Reference" icon="code" href="https://reference-server.pipecat.ai/en/latest/api/pipecat.services.xai.realtime.llm.html">
    Pipecat's API methods for Grok Realtime integration
  </Card>

  <Card title="Example Implementation" icon="play" href="https://github.com/pipecat-ai/pipecat/blob/main/examples/realtime/realtime-grok.py">
    Complete Grok Realtime conversation example
  </Card>

  <Card title="Grok Voice Documentation" icon="book" href="https://docs.x.ai/docs/guides/voice/agent">
    Official xAI Grok Voice Agent API documentation
  </Card>

  <Card title="xAI Console" icon="external-link" href="https://console.x.ai/">
    Access Grok models and manage API keys
  </Card>
</CardGroup>

## Installation

To use Grok Realtime services, install the required dependencies:

```bash theme={null}
uv add "pipecat-ai[grok]"
```

## Prerequisites

### xAI Account Setup

Before using Grok Realtime services, you need:

1. **xAI Account**: Sign up at [xAI Console](https://console.x.ai/)
2. **API Key**: Generate a Grok API key from your account dashboard
3. **Model Access**: Ensure access to Grok Voice Agent models
4. **Usage Limits**: Configure appropriate usage limits and billing

### Required Environment Variables

* `XAI_API_KEY`: Your xAI API key for authentication

### Key Features

* **Real-time Speech-to-Speech**: Direct audio processing with low latency
* **Multilingual Support**: Support for multiple languages
* **Voice Activity Detection**: Server-side VAD for automatic speech detection
* **Function Calling**: Seamless support for external functions and tool integration
* **Multiple Voice Options**: Various voice personalities available
* **WebSocket Support**: Real-time bidirectional audio streaming

## Configuration

### GrokRealtimeLLMService

<ParamField path="api_key" type="str" required>
  xAI API key for authentication.
</ParamField>

<ParamField path="base_url" type="str" default="wss://api.x.ai/v1/realtime">
  WebSocket base URL for the Grok Realtime API. Override for custom deployments.
</ParamField>

<ParamField path="session_properties" type="SessionProperties" default="None" deprecated>
  Configuration properties for the realtime session. If `None`, uses default
  `SessionProperties` with voice `"Ara"` and server-side VAD enabled. See
  [SessionProperties](#sessionproperties) below.

  *Deprecated in v0.0.105. Use `settings=GrokRealtimeLLMService.Settings(session_properties=...)` instead.*
</ParamField>

<ParamField path="settings" type="GrokRealtimeLLMService.Settings" default="None">
  Runtime-configurable settings. See [Settings](#settings) below.
</ParamField>

<ParamField path="start_audio_paused" type="bool" default="False">
  Whether to start with audio input paused.
</ParamField>

### Settings

Runtime-configurable settings passed via the `settings` constructor argument using `GrokRealtimeLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.

| Parameter            | Type                | Default     | Description                                                     |
| -------------------- | ------------------- | ----------- | --------------------------------------------------------------- |
| `model`              | `str`               | `NOT_GIVEN` | Model identifier. *(Inherited from base settings.)*             |
| `system_instruction` | `str`               | `NOT_GIVEN` | System instruction/prompt. *(Inherited from base settings.)*    |
| `session_properties` | `SessionProperties` | `NOT_GIVEN` | Session-level configuration (voice, audio config, tools, etc.). |

<Note>
  `NOT_GIVEN` values are omitted, letting the service use its own defaults. Only
  parameters that are explicitly set are included.
</Note>

### SessionProperties

| Parameter        | Type                                         | Default                            | Description                                                                           |
| ---------------- | -------------------------------------------- | ---------------------------------- | ------------------------------------------------------------------------------------- |
| `instructions`   | `str`                                        | `None`                             | System instructions for the assistant.                                                |
| `voice`          | `Literal["Ara", "Rex", "Sal", "Eve", "Leo"]` | `"Ara"`                            | Voice the model uses to respond.                                                      |
| `turn_detection` | `TurnDetection`                              | `TurnDetection(type="server_vad")` | Turn detection configuration. Set to `None` for manual turn detection.                |
| `audio`          | `AudioConfiguration`                         | `None`                             | Configuration for input and output audio formats.                                     |
| `tools`          | `List[GrokTool]`                             | `None`                             | Available tools: `web_search`, `x_search`, `file_search`, or custom `function` tools. |

### AudioConfiguration

The `audio` field in `SessionProperties` accepts an `AudioConfiguration` with `input` and `output` sub-configurations:

**AudioInput** (`audio.input`):

| Parameter | Type          | Default | Description                                                                                                               |
| --------- | ------------- | ------- | ------------------------------------------------------------------------------------------------------------------------- |
| `format`  | `AudioFormat` | `None`  | Input audio format. Supports `PCMAudioFormat` (configurable rate), `PCMUAudioFormat` (8kHz), or `PCMAAudioFormat` (8kHz). |

**AudioOutput** (`audio.output`):

| Parameter | Type          | Default | Description                                        |
| --------- | ------------- | ------- | -------------------------------------------------- |
| `format`  | `AudioFormat` | `None`  | Output audio format. Same format options as input. |

Grok PCM audio supports sample rates: 8000, 16000, 21050, 24000, 32000, 44100, and 48000 Hz.

### Built-in Tools

Grok provides several built-in tools in addition to custom function tools:

| Tool             | Type          | Description                                                        |
| ---------------- | ------------- | ------------------------------------------------------------------ |
| `WebSearchTool`  | `web_search`  | Search the web for current information                             |
| `XSearchTool`    | `x_search`    | Search X (Twitter) for posts. Supports `allowed_x_handles` filter. |
| `FileSearchTool` | `file_search` | Search uploaded document collections by `vector_store_ids`         |

## Usage

### Basic Setup

```python theme={null}
import os
from pipecat.services.xai.realtime.llm import GrokRealtimeLLMService

llm = GrokRealtimeLLMService(
    api_key=os.getenv("XAI_API_KEY"),
)
```

### With Session Configuration

```python theme={null}
from pipecat.services.xai.realtime.llm import GrokRealtimeLLMService
from pipecat.services.xai.realtime.events import (
    SessionProperties,
    TurnDetection,
    AudioConfiguration,
    AudioInput,
    AudioOutput,
    PCMAudioFormat,
)

session_properties = SessionProperties(
    instructions="You are a helpful assistant.",
    voice="Rex",
    turn_detection=TurnDetection(type="server_vad"),
    audio=AudioConfiguration(
        input=AudioInput(format=PCMAudioFormat(rate=16000)),
        output=AudioOutput(format=PCMAudioFormat(rate=16000)),
    ),
)

llm = GrokRealtimeLLMService(
    api_key=os.getenv("XAI_API_KEY"),
    settings=GrokRealtimeLLMService.Settings(
        session_properties=session_properties,
    ),
)
```

### With Built-in Tools

```python theme={null}
from pipecat.services.xai.realtime.llm import GrokRealtimeLLMService
from pipecat.services.xai.realtime.events import (
    SessionProperties,
    WebSearchTool,
    XSearchTool,
)

llm = GrokRealtimeLLMService(
    api_key=os.getenv("XAI_API_KEY"),
    settings=GrokRealtimeLLMService.Settings(
        session_properties=SessionProperties(
            instructions="You are a helpful assistant with access to web search.",
            voice="Ara",
            tools=[
                WebSearchTool(),
                XSearchTool(allowed_x_handles=["@elonmusk"]),
            ],
        ),
    ),
)
```

### Updating Settings at Runtime

```python theme={null}
from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.xai.realtime.llm import GrokRealtimeLLMSettings
from pipecat.services.xai.realtime.events import SessionProperties

await task.queue_frame(
    LLMUpdateSettingsFrame(
        delta=GrokRealtimeLLMSettings(
            session_properties=SessionProperties(
                instructions="Now speak in Spanish.",
                voice="Eve",
            ),
        )
    )
)
```

<Tip>
  The deprecated `session_properties` constructor parameter is replaced by
  `Settings` as of v0.0.105. Use `Settings` / `settings=` instead. See the
  [Service Settings guide](/pipecat/fundamentals/service-settings) for migration
  details.
</Tip>

## Notes

* **Audio format auto-configuration**: If audio format is not specified in `session_properties`, the service automatically configures PCM input/output using the pipeline's sample rates.
* **Server-side VAD**: Enabled by default. When VAD is enabled, the server handles speech detection and turn management automatically. Set `turn_detection` to `None` to manage turns manually.
* **Audio before setup**: Audio is not sent to Grok until the conversation setup is complete, preventing sample rate mismatches.
* **Available voices**: Ara (default), Rex, Sal, Eve, and Leo.
* **G.711 support**: PCMU and PCMA formats are supported at a fixed 8000 Hz rate, useful for telephony integrations.
* **System instruction precedence**: The `system_instruction` from service settings takes precedence over an initial system message in the LLM context. A warning is logged when both are set.

## Event Handlers

| Event                          | Description                                                   |
| ------------------------------ | ------------------------------------------------------------- |
| `on_conversation_item_created` | Called when a new conversation item is created in the session |
| `on_conversation_item_updated` | Called when a conversation item is updated or completed       |

```python theme={null}
@llm.event_handler("on_conversation_item_created")
async def on_item_created(service, item_id, item):
    print(f"New conversation item: {item_id}")

@llm.event_handler("on_conversation_item_updated")
async def on_item_updated(service, item_id, item):
    print(f"Conversation item updated: {item_id}")
```
