> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Inworld Realtime

> Real-time speech-to-speech service implementation using Inworld's Realtime API

## Overview

`InworldRealtimeLLMService` provides real-time, multimodal conversation capabilities using Inworld's Realtime API. It operates as a cascade STT/LLM/TTS pipeline under the hood with built-in semantic voice activity detection (VAD) for turn management, offering low-latency speech-to-speech interactions with integrated LLM processing and function calling.

<CardGroup cols={2}>
  <Card title="Inworld Realtime API Reference" icon="code" href="https://reference-server.pipecat.ai/en/latest/api/pipecat.services.inworld.realtime.llm.html">
    Pipecat's API methods for Inworld Realtime integration
  </Card>

  <Card title="Example Implementation" icon="play" href="https://github.com/pipecat-ai/pipecat/blob/main/examples/realtime/realtime-inworld.py">
    Complete Inworld Realtime conversation example
  </Card>

  <Card title="Inworld Realtime Documentation" icon="book" href="https://docs.inworld.ai/api-reference/realtimeAPI/realtime/realtime-websocket">
    Official Inworld Realtime API documentation
  </Card>

  <Card title="Inworld Console" icon="external-link" href="https://studio.inworld.ai/">
    Access Inworld models and manage API keys
  </Card>
</CardGroup>

## Installation

To use Inworld Realtime services, install the required dependencies:

```bash theme={null}
uv add "pipecat-ai[inworld]"
```

## Prerequisites

### Inworld Account Setup

Before using Inworld Realtime services, you need:

1. **Inworld Account**: Sign up at [Inworld Studio](https://studio.inworld.ai/)
2. **API Key**: Generate an Inworld API key from your account dashboard
3. **Model Access**: Ensure access to Inworld Realtime models
4. **Usage Limits**: Configure appropriate usage limits and billing

### Required Environment Variables

* `INWORLD_API_KEY`: Your Inworld API key for authentication

### Key Features

* **Real-time Speech-to-Speech**: Direct audio processing with low latency
* **Cascade Pipeline**: Integrated STT → LLM → TTS processing
* **Semantic VAD**: Advanced semantic voice activity detection for natural turn-taking
* **Multilingual Support**: Support for multiple languages via STT model selection
* **Function Calling**: Seamless support for external functions and tool integration
* **Multiple Voice Options**: Various voice personalities available
* **WebSocket Support**: Real-time bidirectional audio streaming
* **Streaming Transcription**: Real-time user speech transcription

## Configuration

### InworldRealtimeLLMService

<ParamField path="api_key" type="str" required>
  Inworld API key for authentication.
</ParamField>

<ParamField path="llm_model" type="str" default="openai/gpt-4.1-mini">
  LLM model to use (e.g. "openai/gpt-4.1-nano"). Can be any supported model or
  an [Inworld Router](https://docs.inworld.ai/router/introduction). Shorthand
  for `session_properties.model`.
</ParamField>

<ParamField path="voice" type="str" default="Clive">
  Voice ID for TTS output (e.g. "Sarah", "Clive"). Shorthand for
  `session_properties.audio.output.voice`.
</ParamField>

<ParamField path="tts_model" type="str" default="inworld-tts-1.5-max">
  TTS model to use (e.g. "inworld-tts-1.5-max"). Shorthand for
  `session_properties.audio.output.model`.
</ParamField>

<ParamField path="stt_model" type="str" default="assemblyai/u3-rt-pro">
  STT model for input transcription (e.g.
  "assemblyai/universal-streaming-multilingual"). Shorthand for
  `session_properties.audio.input.transcription.model`.
</ParamField>

<ParamField path="base_url" type="str" default="wss://api.inworld.ai/api/v1/realtime/session">
  WebSocket base URL for the Inworld Realtime API. Override for custom
  deployments.
</ParamField>

<ParamField path="auth_type" type="Literal['basic', 'bearer']" default="basic">
  Authentication type. `"basic"` for server-side API key auth, `"bearer"` for
  client-side JWT auth.
</ParamField>

<ParamField path="settings" type="InworldRealtimeLLMService.Settings" default="None">
  Runtime-configurable settings. See [Settings](#settings) below.
</ParamField>

<ParamField path="start_audio_paused" type="bool" default="False">
  Whether to start with audio input paused.
</ParamField>

### Settings

Runtime-configurable settings passed via the `settings` constructor argument using `InworldRealtimeLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.

| Parameter            | Type                | Default     | Description                                                            |
| -------------------- | ------------------- | ----------- | ---------------------------------------------------------------------- |
| `model`              | `str`               | `NOT_GIVEN` | Model identifier. *(Inherited from base settings.)*                    |
| `system_instruction` | `str`               | `NOT_GIVEN` | System instruction/prompt. *(Inherited from base settings.)*           |
| `temperature`        | `float`             | `NOT_GIVEN` | Temperature for response generation. *(Inherited from base settings.)* |
| `session_properties` | `SessionProperties` | `NOT_GIVEN` | Session-level configuration (voice, audio config, tools, etc.).        |

<Note>
  `NOT_GIVEN` values are omitted, letting the service use its own defaults. Only
  parameters that are explicitly set are included.
</Note>

### SessionProperties

| Parameter           | Type                 | Default             | Description                                       |
| ------------------- | -------------------- | ------------------- | ------------------------------------------------- |
| `model`             | `str`                | `None`              | LLM model to use (e.g. "openai/gpt-4.1-nano").    |
| `instructions`      | `str`                | `None`              | System instructions for the assistant.            |
| `temperature`       | `float`              | `None`              | Temperature for response generation.              |
| `output_modalities` | `List[str]`          | `["audio", "text"]` | Output modalities for the assistant.              |
| `audio`             | `AudioConfiguration` | `None`              | Configuration for input and output audio formats. |
| `tools`             | `List[FunctionTool]` | `None`              | Available custom function tools.                  |

### AudioConfiguration

The `audio` field in `SessionProperties` accepts an `AudioConfiguration` with `input` and `output` sub-configurations:

**AudioInput** (`audio.input`):

| Parameter        | Type                 | Default | Description                                                                                                               |
| ---------------- | -------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------- |
| `format`         | `AudioFormat`        | `None`  | Input audio format. Supports `PCMAudioFormat` (configurable rate), `PCMUAudioFormat` (8kHz), or `PCMAAudioFormat` (8kHz). |
| `transcription`  | `InputTranscription` | `None`  | Configuration for input audio transcription. Includes `model` field for STT model selection.                              |
| `turn_detection` | `TurnDetection`      | `None`  | Turn detection configuration. Supports `"semantic_vad"` and `"server_vad"` types.                                         |

**AudioOutput** (`audio.output`):

| Parameter | Type          | Default | Description                                        |
| --------- | ------------- | ------- | -------------------------------------------------- |
| `format`  | `AudioFormat` | `None`  | Output audio format. Same format options as input. |
| `model`   | `str`         | `None`  | TTS model to use (e.g. "inworld-tts-1.5-max").     |
| `voice`   | `str`         | `None`  | Voice ID (e.g. "Sarah", "Clive").                  |

Inworld PCM audio supports sample rates: 8000, 16000, 24000, 32000, 44100, and 48000 Hz.

### TurnDetection

| Parameter            | Type                                    | Default          | Description                                                                         |
| -------------------- | --------------------------------------- | ---------------- | ----------------------------------------------------------------------------------- |
| `type`               | `Literal["server_vad", "semantic_vad"]` | `"semantic_vad"` | Detection type. "semantic\_vad" for semantic-based, "server\_vad" for standard VAD. |
| `eagerness`          | `str`                                   | `None`           | How eagerly to detect end of turn. Options: "low", "medium", "high".                |
| `create_response`    | `bool`                                  | `None`           | Whether to automatically create a response on turn end.                             |
| `interrupt_response` | `bool`                                  | `None`           | Whether user speech interrupts the current response.                                |

## Usage

### Basic Setup

```python theme={null}
import os
from pipecat.services.inworld.realtime.llm import InworldRealtimeLLMService

llm = InworldRealtimeLLMService(
    api_key=os.getenv("INWORLD_API_KEY"),
    llm_model="xai/grok-4-1-fast-non-reasoning",
    voice="Sarah",
    settings=InworldRealtimeLLMService.Settings(
        system_instruction=(
            "You are a helpful and friendly AI assistant powered by Inworld. "
            "Keep your responses concise and conversational since this is a "
            "voice interaction."
        ),
    ),
)
```

### With Model and Voice Configuration

```python theme={null}
from pipecat.services.inworld.realtime.llm import InworldRealtimeLLMService

llm = InworldRealtimeLLMService(
    api_key=os.getenv("INWORLD_API_KEY"),
    llm_model="openai/gpt-4.1-nano",
    voice="Sarah",
    tts_model="inworld-tts-1.5-max",
    stt_model="assemblyai/universal-streaming-multilingual",
)
```

### With Full Session Configuration

```python theme={null}
from pipecat.services.inworld.realtime.llm import InworldRealtimeLLMService
from pipecat.services.inworld.realtime.events import (
    SessionProperties,
    TurnDetection,
    AudioConfiguration,
    AudioInput,
    AudioOutput,
    PCMAudioFormat,
    InputTranscription,
)

session_properties = SessionProperties(
    model="openai/gpt-4.1-nano",
    instructions="You are a helpful assistant.",
    temperature=0.7,
    audio=AudioConfiguration(
        input=AudioInput(
            format=PCMAudioFormat(rate=24000),
            transcription=InputTranscription(
                model="assemblyai/u3-rt-pro"
            ),
            turn_detection=TurnDetection(
                type="semantic_vad",
                eagerness="low",
                create_response=True,
                interrupt_response=True,
            ),
        ),
        output=AudioOutput(
            format=PCMAudioFormat(rate=24000),
            model="inworld-tts-1.5-max",
            voice="Sarah",
        ),
    ),
)

llm = InworldRealtimeLLMService(
    api_key=os.getenv("INWORLD_API_KEY"),
    settings=InworldRealtimeLLMService.Settings(
        session_properties=session_properties,
    ),
)
```

### Updating Settings at Runtime

For partial updates, prefer the top-level fields (`model`, `system_instruction`,
`temperature`). They are synced into `session_properties` automatically, so you
don't need to resend a full `SessionProperties` config:

```python theme={null}
from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.inworld.realtime.llm import InworldRealtimeLLMSettings

await task.queue_frame(
    LLMUpdateSettingsFrame(
        delta=InworldRealtimeLLMSettings(
            system_instruction="Now speak in Spanish.",
        )
    )
)
```

To change nested fields like `voice`, send a complete `SessionProperties` (it
replaces the stored config wholesale):

```python theme={null}
from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.inworld.realtime.llm import InworldRealtimeLLMSettings
from pipecat.services.inworld.realtime.events import (
    SessionProperties,
    AudioConfiguration,
    AudioOutput,
    PCMAudioFormat,
)

await task.queue_frame(
    LLMUpdateSettingsFrame(
        delta=InworldRealtimeLLMSettings(
            session_properties=SessionProperties(
                model="openai/gpt-4.1-nano",
                instructions="Now speak in Spanish.",
                audio=AudioConfiguration(
                    output=AudioOutput(
                        format=PCMAudioFormat(rate=24000),
                        voice="Sarah",
                    ),
                ),
            ),
        )
    )
)
```

## Notes

* **Audio format auto-configuration**: If audio format is not specified in `session_properties`, the service automatically configures PCM input/output using the pipeline's sample rates (defaults to 24000 Hz).
* **Semantic VAD by default**: The service uses semantic VAD (`"semantic_vad"`) by default for more natural turn detection. When VAD is enabled, the server handles speech detection and turn management automatically.
* **Cascade architecture**: The service operates as an integrated STT → LLM → TTS pipeline on the server side, simplifying client-side implementation.
* **Audio before setup**: Audio is not sent to Inworld until the conversation setup is complete, preventing sample rate mismatches.
* **G.711 support**: PCMU and PCMA formats are supported at a fixed 8000 Hz rate, useful for telephony integrations.
* **System instruction precedence**: The `system_instruction` from service settings takes precedence over an initial system message in the LLM context. A warning is logged when both are set.
* **Settings replacement**: When providing `session_properties` in `settings`, it **replaces** all defaults wholesale — provide a complete `SessionProperties` configuration in that case. Use the constructor shortcuts (`llm_model`, `voice`, `tts_model`, `stt_model`) for simpler configuration.

## Event Handlers

| Event                          | Description                                                   |
| ------------------------------ | ------------------------------------------------------------- |
| `on_conversation_item_created` | Called when a new conversation item is created in the session |
| `on_conversation_item_updated` | Called when a conversation item is updated or completed       |

```python theme={null}
@llm.event_handler("on_conversation_item_created")
async def on_item_created(service, item_id, item):
    print(f"New conversation item: {item_id}")

@llm.event_handler("on_conversation_item_updated")
async def on_item_updated(service, item_id, item):
    print(f"Conversation item updated: {item_id}")
```
