> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# OpenAI Responses

> Large Language Model services using OpenAI's Responses API

## Overview

Pipecat provides two variants of the OpenAI Responses API LLM service:

* **`OpenAIResponsesLLMService`** (WebSocket-based, recommended): Maintains a persistent WebSocket connection for lower-latency inference and automatically uses `previous_response_id` to send only incremental context when possible.
* **`OpenAIResponsesHttpLLMService`** (HTTP-based): Uses server-sent events (SSE) via HTTP streaming. Each request opens a new connection. Use this when WebSocket is not available or preferred.

Both variants support streaming text responses, function calling, usage metrics, and out-of-band inference, and work with the universal `LLMContext` and `LLMContextAggregatorPair`.

<Note>
  The Responses API is a newer OpenAI API designed for conversational AI
  applications. It differs from the Chat Completions API in its request/response
  structure and streaming format. See [OpenAI Responses API
  documentation](https://platform.openai.com/docs/api-reference/responses) for
  more details.
</Note>

### WebSocket vs HTTP

**Use WebSocket (`OpenAIResponsesLLMService`)** when:

* You need the lowest possible latency for real-time conversations
* Your workflow involves frequent tool/function calls
* You want automatic incremental context optimization without server-side storage

**Use HTTP (`OpenAIResponsesHttpLLMService`)** when:

* WebSocket connections are blocked by your infrastructure
* You prefer stateless request/response patterns
* You don't need the incremental context optimization

The WebSocket variant's `previous_response_id` optimization works with `store=False` (the default) using a connection-local in-memory cache—no conversations are stored on OpenAI's servers. The HTTP variant does not offer this optimization by default, as it would require `store=True` (30-day OpenAI-side conversation storage).

<CardGroup cols={2}>
  <Card title="OpenAI Responses API Reference" icon="code" href="https://reference-server.pipecat.ai/en/latest/api/pipecat.services.openai.responses.llm.html">
    Pipecat's API methods for OpenAI Responses integration
  </Card>

  <Card title="Example Implementation" icon="play" href="https://github.com/pipecat-ai/pipecat/blob/main/examples/voice/voice-openai-responses.py">
    Interruptible conversation example
  </Card>

  <Card title="OpenAI Documentation" icon="book" href="https://platform.openai.com/docs/api-reference/responses">
    Official OpenAI Responses API documentation
  </Card>

  <Card title="OpenAI Platform" icon="microphone" href="https://platform.openai.com/api-keys">
    Access models and manage API keys
  </Card>
</CardGroup>

## Installation

To use OpenAI services, install the required dependencies:

```bash theme={null}
uv add "pipecat-ai[openai]"
```

## Prerequisites

### OpenAI Account Setup

Before using OpenAI Responses LLM services, you need:

1. **OpenAI Account**: Sign up at [OpenAI Platform](https://platform.openai.com/)
2. **API Key**: Generate an API key from your account dashboard
3. **Model Selection**: Choose from available models (GPT-4.1, GPT-4o, GPT-4o-mini, etc.)
4. **Usage Limits**: Set up billing and usage limits as needed

### Required Environment Variables

* `OPENAI_API_KEY`: Your OpenAI API key for authentication

## Configuration

### Common Parameters

These parameters are available for both `OpenAIResponsesLLMService` and `OpenAIResponsesHttpLLMService`:

<ParamField path="api_key" type="str" default="None">
  OpenAI API key. If `None`, uses the `OPENAI_API_KEY` environment variable.
</ParamField>

<ParamField path="base_url" type="str" default="None">
  Custom base URL for the OpenAI API. Override for proxied or self-hosted
  deployments.
</ParamField>

<ParamField path="organization" type="str" default="None">
  OpenAI organization ID.
</ParamField>

<ParamField path="project" type="str" default="None">
  OpenAI project ID.
</ParamField>

<ParamField path="default_headers" type="Mapping[str, str]" default="None">
  Additional HTTP headers to include in every request.
</ParamField>

<ParamField path="service_tier" type="str" default="None">
  Service tier to use (e.g., "auto", "flex", "priority").
</ParamField>

<ParamField path="settings" type="OpenAIResponsesLLMSettings" default="None">
  Runtime-configurable model settings. See [Settings](#settings) below.
</ParamField>

### WebSocket-Specific Parameters

The following parameter is only available for `OpenAIResponsesLLMService` (WebSocket variant):

<ParamField path="ws_url" type="str" default="wss://api.openai.com/v1/responses">
  WebSocket endpoint URL. Override for custom deployments or proxies.
</ParamField>

### Settings

Runtime-configurable settings passed via the `settings` constructor argument using `OpenAIResponsesLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.

| Parameter               | Type    | Default     | Description                                                                                         |
| ----------------------- | ------- | ----------- | --------------------------------------------------------------------------------------------------- |
| `model`                 | `str`   | `"gpt-4.1"` | OpenAI model identifier. *(Inherited from base settings.)*                                          |
| `system_instruction`    | `str`   | `None`      | System instruction/prompt for the model. *(Inherited from base settings.)*                          |
| `temperature`           | `float` | `NOT_GIVEN` | Sampling temperature (0.0 to 2.0). Lower values are more focused, higher values are more creative.  |
| `top_p`                 | `float` | `NOT_GIVEN` | Top-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output.                                |
| `frequency_penalty`     | `float` | `None`      | Penalty for frequent tokens (-2.0 to 2.0). Positive values discourage repetition.                   |
| `presence_penalty`      | `float` | `None`      | Penalty for new topics (-2.0 to 2.0). Positive values encourage the model to talk about new topics. |
| `seed`                  | `int`   | `None`      | Random seed for deterministic outputs.                                                              |
| `max_completion_tokens` | `int`   | `NOT_GIVEN` | Maximum completion tokens to generate.                                                              |

<Note>
  `NOT_GIVEN` values are omitted from the API request entirely, letting the
  OpenAI API use its own defaults. This is different from `None`, which would be
  sent explicitly.
</Note>

## Usage

### Basic Setup

**WebSocket variant (recommended):**

```python theme={null}
from pipecat.services.openai.responses.llm import OpenAIResponsesLLMService

llm = OpenAIResponsesLLMService(
    api_key=os.getenv("OPENAI_API_KEY"),
    settings=OpenAIResponsesLLMService.Settings(
        model="gpt-4.1",
        system_instruction="You are a helpful assistant.",
    ),
)
```

**HTTP variant:**

```python theme={null}
from pipecat.services.openai.responses.llm import OpenAIResponsesHttpLLMService

llm = OpenAIResponsesHttpLLMService(
    api_key=os.getenv("OPENAI_API_KEY"),
    settings=OpenAIResponsesHttpLLMService.Settings(
        model="gpt-4.1",
        system_instruction="You are a helpful assistant.",
    ),
)
```

### With Custom Settings

```python theme={null}
from pipecat.services.openai.responses.llm import OpenAIResponsesLLMService

llm = OpenAIResponsesLLMService(
    api_key=os.getenv("OPENAI_API_KEY"),
    settings=OpenAIResponsesLLMService.Settings(
        model="gpt-4.1",
        temperature=0.7,
        max_completion_tokens=1000,
        frequency_penalty=0.5,
    ),
)
```

<Note>
  Both `OpenAIResponsesLLMService.Settings` and
  `OpenAIResponsesHttpLLMService.Settings` use the same
  `OpenAIResponsesLLMSettings` class, so settings are identical between
  variants.
</Note>

### Updating Settings at Runtime

Model settings can be changed mid-conversation using `LLMUpdateSettingsFrame`:

```python theme={null}
from pipecat.frames.frames import LLMUpdateSettingsFrame

await task.queue_frame(
    LLMUpdateSettingsFrame(
        delta=OpenAIResponsesLLMService.Settings(
            temperature=0.3,
            max_completion_tokens=500,
        )
    )
)
```

### Out-of-Band Inference

Run a one-shot inference without pushing frames through the pipeline:

```python theme={null}
from pipecat.processors.aggregators.llm_context import LLMContext

context = LLMContext()
context.add_user_message("What is the capital of France?")

response = await llm.run_inference(
    context=context,
    max_tokens=100,
    system_instruction="You are a helpful geography assistant.",
)
print(response)  # "The capital of France is Paris."
```

## Notes

* **WebSocket is the new default**: As of Pipecat version with PR #4141, `OpenAIResponsesLLMService` uses WebSocket transport by default. If you need the HTTP streaming behavior, use `OpenAIResponsesHttpLLMService` instead. Both have identical constructor args and settings.
* **Persistent WebSocket connection**: The WebSocket variant maintains a persistent connection to `wss://api.openai.com/v1/responses` and automatically reconnects on connection loss. Connection lifetime is limited to 60 minutes server-side, after which automatic reconnection occurs.
* **Incremental context optimization**: The WebSocket variant uses `previous_response_id` to send only incremental context when the conversation prefix hasn't changed, reducing latency and costs. This works with `store=False` (no server-side storage) via a connection-local cache.
* **Responses API vs Chat Completions API**: The Responses API has a different request/response structure compared to the Chat Completions API. Use `OpenAILLMService` for the Chat Completions API and `OpenAIResponsesLLMService` or `OpenAIResponsesHttpLLMService` for the Responses API.
* **Universal LLM Context**: Both services work with the universal `LLMContext` and `LLMContextAggregatorPair`, making it easy to switch between different LLM providers.
* **Function calling**: Supports OpenAI's tool/function calling format. Register function handlers on the pipeline task to handle tool calls automatically.
* **Usage metrics**: Automatically tracks token usage, including cached tokens and reasoning tokens.
* **Service tiers**: Supports OpenAI's service tier system for prioritizing requests.

## Event Handlers

Both `OpenAIResponsesLLMService` and `OpenAIResponsesHttpLLMService` support the following event handlers, inherited from [LLMService](/api-reference/server/events/service-events):

| Event                       | Description                                                             |
| --------------------------- | ----------------------------------------------------------------------- |
| `on_completion_timeout`     | Called when an LLM completion request times out                         |
| `on_function_calls_started` | Called when function calls are received and execution is about to start |
| `on_connection_error`       | Called when a WebSocket connection error occurs                         |

```python theme={null}
@llm.event_handler("on_completion_timeout")
async def on_completion_timeout(service):
    print("LLM completion timed out")

@llm.event_handler("on_connection_error")
async def on_connection_error(service, error):
    print(f"LLM connection error: {error}")
```
