> ## Documentation Index > Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt > Use this file to discover all available pages before exploring further. # OpenAI Responses > Large Language Model services using OpenAI's Responses API ## Overview Pipecat provides two variants of the OpenAI Responses API LLM service: * **`OpenAIResponsesLLMService`** (WebSocket-based, recommended): Maintains a persistent WebSocket connection for lower-latency inference and automatically uses `previous_response_id` to send only incremental context when possible. * **`OpenAIResponsesHttpLLMService`** (HTTP-based): Uses server-sent events (SSE) via HTTP streaming. Each request opens a new connection. Use this when WebSocket is not available or preferred. Both variants support streaming text responses, function calling, usage metrics, and out-of-band inference, and work with the universal `LLMContext` and `LLMContextAggregatorPair`. The Responses API is a newer OpenAI API designed for conversational AI applications. It differs from the Chat Completions API in its request/response structure and streaming format. See [OpenAI Responses API documentation](https://platform.openai.com/docs/api-reference/responses) for more details. ### WebSocket vs HTTP **Use WebSocket (`OpenAIResponsesLLMService`)** when: * You need the lowest possible latency for real-time conversations * Your workflow involves frequent tool/function calls * You want automatic incremental context optimization without server-side storage **Use HTTP (`OpenAIResponsesHttpLLMService`)** when: * WebSocket connections are blocked by your infrastructure * You prefer stateless request/response patterns * You don't need the incremental context optimization The WebSocket variant's `previous_response_id` optimization works with `store=False` (the default) using a connection-local in-memory cache—no conversations are stored on OpenAI's servers. The HTTP variant does not offer this optimization by default, as it would require `store=True` (30-day OpenAI-side conversation storage). Pipecat's API methods for OpenAI Responses integration Interruptible conversation example Official OpenAI Responses API documentation Access models and manage API keys ## Installation To use OpenAI services, install the required dependencies: ```bash theme={null} uv add "pipecat-ai[openai]" ``` ## Prerequisites ### OpenAI Account Setup Before using OpenAI Responses LLM services, you need: 1. **OpenAI Account**: Sign up at [OpenAI Platform](https://platform.openai.com/) 2. **API Key**: Generate an API key from your account dashboard 3. **Model Selection**: Choose from available models (GPT-4.1, GPT-4o, GPT-4o-mini, etc.) 4. **Usage Limits**: Set up billing and usage limits as needed ### Required Environment Variables * `OPENAI_API_KEY`: Your OpenAI API key for authentication ## Configuration ### Common Parameters These parameters are available for both `OpenAIResponsesLLMService` and `OpenAIResponsesHttpLLMService`: OpenAI API key. If `None`, uses the `OPENAI_API_KEY` environment variable. Custom base URL for the OpenAI API. Override for proxied or self-hosted deployments. OpenAI organization ID. OpenAI project ID. Additional HTTP headers to include in every request. Service tier to use (e.g., "auto", "flex", "priority"). Runtime-configurable model settings. See [Settings](#settings) below. ### WebSocket-Specific Parameters The following parameter is only available for `OpenAIResponsesLLMService` (WebSocket variant): WebSocket endpoint URL. Override for custom deployments or proxies. ### Settings Runtime-configurable settings passed via the `settings` constructor argument using `OpenAIResponsesLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details. | Parameter | Type | Default | Description | | ----------------------- | ------- | ----------- | --------------------------------------------------------------------------------------------------- | | `model` | `str` | `"gpt-4.1"` | OpenAI model identifier. *(Inherited from base settings.)* | | `system_instruction` | `str` | `None` | System instruction/prompt for the model. *(Inherited from base settings.)* | | `temperature` | `float` | `NOT_GIVEN` | Sampling temperature (0.0 to 2.0). Lower values are more focused, higher values are more creative. | | `top_p` | `float` | `NOT_GIVEN` | Top-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output. | | `frequency_penalty` | `float` | `None` | Penalty for frequent tokens (-2.0 to 2.0). Positive values discourage repetition. | | `presence_penalty` | `float` | `None` | Penalty for new topics (-2.0 to 2.0). Positive values encourage the model to talk about new topics. | | `seed` | `int` | `None` | Random seed for deterministic outputs. | | `max_completion_tokens` | `int` | `NOT_GIVEN` | Maximum completion tokens to generate. | `NOT_GIVEN` values are omitted from the API request entirely, letting the OpenAI API use its own defaults. This is different from `None`, which would be sent explicitly. ## Usage ### Basic Setup **WebSocket variant (recommended):** ```python theme={null} from pipecat.services.openai.responses.llm import OpenAIResponsesLLMService llm = OpenAIResponsesLLMService( api_key=os.getenv("OPENAI_API_KEY"), settings=OpenAIResponsesLLMService.Settings( model="gpt-4.1", system_instruction="You are a helpful assistant.", ), ) ``` **HTTP variant:** ```python theme={null} from pipecat.services.openai.responses.llm import OpenAIResponsesHttpLLMService llm = OpenAIResponsesHttpLLMService( api_key=os.getenv("OPENAI_API_KEY"), settings=OpenAIResponsesHttpLLMService.Settings( model="gpt-4.1", system_instruction="You are a helpful assistant.", ), ) ``` ### With Custom Settings ```python theme={null} from pipecat.services.openai.responses.llm import OpenAIResponsesLLMService llm = OpenAIResponsesLLMService( api_key=os.getenv("OPENAI_API_KEY"), settings=OpenAIResponsesLLMService.Settings( model="gpt-4.1", temperature=0.7, max_completion_tokens=1000, frequency_penalty=0.5, ), ) ``` Both `OpenAIResponsesLLMService.Settings` and `OpenAIResponsesHttpLLMService.Settings` use the same `OpenAIResponsesLLMSettings` class, so settings are identical between variants. ### Updating Settings at Runtime Model settings can be changed mid-conversation using `LLMUpdateSettingsFrame`: ```python theme={null} from pipecat.frames.frames import LLMUpdateSettingsFrame await worker.queue_frame( LLMUpdateSettingsFrame( delta=OpenAIResponsesLLMService.Settings( temperature=0.3, max_completion_tokens=500, ) ) ) ``` ### Out-of-Band Inference Run a one-shot inference without pushing frames through the pipeline: ```python theme={null} from pipecat.processors.aggregators.llm_context import LLMContext context = LLMContext() context.add_user_message("What is the capital of France?") response = await llm.run_inference( context=context, max_tokens=100, system_instruction="You are a helpful geography assistant.", ) print(response) # "The capital of France is Paris." ``` ## Notes * **WebSocket is the new default**: As of Pipecat version with PR #4141, `OpenAIResponsesLLMService` uses WebSocket transport by default. If you need the HTTP streaming behavior, use `OpenAIResponsesHttpLLMService` instead. Both have identical constructor args and settings. * **Persistent WebSocket connection**: The WebSocket variant maintains a persistent connection to `wss://api.openai.com/v1/responses` and automatically reconnects on connection loss. Connection lifetime is limited to 60 minutes server-side, after which automatic reconnection occurs. * **Incremental context optimization**: The WebSocket variant uses `previous_response_id` to send only incremental context when the conversation prefix hasn't changed, reducing latency and costs. This works with `store=False` (no server-side storage) via a connection-local cache. * **Responses API vs Chat Completions API**: The Responses API has a different request/response structure compared to the Chat Completions API. Use `OpenAILLMService` for the Chat Completions API and `OpenAIResponsesLLMService` or `OpenAIResponsesHttpLLMService` for the Responses API. * **Universal LLM Context**: Both services work with the universal `LLMContext` and `LLMContextAggregatorPair`, making it easy to switch between different LLM providers. * **Function calling**: Supports OpenAI's tool/function calling format. Register function handlers on the pipeline worker to handle tool calls automatically. * **Usage metrics**: Automatically tracks token usage, including cached tokens and reasoning tokens. * **Service tiers**: Supports OpenAI's service tier system for prioritizing requests. ## Event Handlers Both `OpenAIResponsesLLMService` and `OpenAIResponsesHttpLLMService` support the following event handlers, inherited from [LLMService](/api-reference/server/events/service-events): | Event | Description | | --------------------------- | ----------------------------------------------------------------------- | | `on_completion_timeout` | Called when an LLM completion request times out | | `on_function_calls_started` | Called when function calls are received and execution is about to start | | `on_connection_error` | Called when a WebSocket connection error occurs | ```python theme={null} @llm.event_handler("on_completion_timeout") async def on_completion_timeout(service): print("LLM completion timed out") @llm.event_handler("on_connection_error") async def on_connection_error(service, error): print(f"LLM connection error: {error}") ```