> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# OpenAI

> Speech-to-text service implementations using OpenAI's Speech-to-Text APIs

## Overview

OpenAI provides two STT service implementations:

* **`OpenAISTTService`** (HTTP) -- VAD-segmented speech recognition using OpenAI's transcription API, supporting GPT-4o transcription and Whisper models.
* **`OpenAIRealtimeSTTService`** (WebSocket) -- Real-time streaming speech-to-text using OpenAI's Realtime API transcription sessions, with support for local VAD and server-side VAD modes.

<CardGroup cols={2}>
  <Card title="OpenAI STT API Reference" icon="code" href="https://reference-server.pipecat.ai/en/latest/api/pipecat.services.openai.stt.html">
    Pipecat's API methods for OpenAI STT integration
  </Card>

  <Card title="Example Implementation" icon="play" href="https://github.com/pipecat-ai/pipecat/blob/main/examples/voice/voice-openai.py">
    Complete example with OpenAI ecosystem integration
  </Card>

  <Card title="OpenAI Documentation" icon="book" href="https://platform.openai.com/docs/api-reference/audio/createTranscription">
    Official OpenAI transcription documentation and features
  </Card>

  <Card title="OpenAI Platform" icon="microphone" href="https://platform.openai.com/api-keys">
    Access API keys and transcription models
  </Card>
</CardGroup>

## Installation

To use OpenAI services, install the required dependency:

```bash theme={null}
uv add "pipecat-ai[openai]"
```

## Prerequisites

### OpenAI Account Setup

Before using OpenAI STT services, you need:

1. **OpenAI Account**: Sign up at [OpenAI Platform](https://platform.openai.com/)
2. **API Key**: Generate an API key from your account dashboard
3. **Model Access**: Ensure access to GPT-4o transcription and Whisper models

### Required Environment Variables

* `OPENAI_API_KEY`: Your OpenAI API key for authentication

## OpenAISTTService

Uses VAD-based audio segmentation with HTTP transcription requests. Records speech segments detected by local VAD and sends them to OpenAI's transcription API.

<ParamField path="model" type="str" default="gpt-4o-transcribe" deprecated>
  Transcription model to use. Options include `"gpt-4o-transcribe"`,
  `"gpt-4o-mini-transcribe"`, and `"whisper-1"`. *Deprecated in v0.0.105. Use
  `settings=OpenAISTTService.Settings(...)` instead.*
</ParamField>

<ParamField path="api_key" type="str" default="None">
  OpenAI API key. Falls back to the `OPENAI_API_KEY` environment variable.
</ParamField>

<ParamField path="base_url" type="str" default="None">
  API base URL. Override for custom or proxied deployments.
</ParamField>

<ParamField path="language" type="Language" default="Language.EN" deprecated>
  Language of the audio input. *Deprecated in v0.0.105. Use
  `settings=OpenAISTTService.Settings(...)` instead.*
</ParamField>

<ParamField path="prompt" type="str" default="None" deprecated>
  Optional text to guide the model's style or continue a previous segment.
  *Deprecated in v0.0.105. Use `settings=OpenAISTTService.Settings(...)`
  instead.*
</ParamField>

<ParamField path="temperature" type="float" default="None" deprecated>
  Sampling temperature between 0 and 1. Lower values produce more deterministic
  results. *Deprecated in v0.0.105. Use
  `settings=OpenAISTTService.Settings(...)` instead.*
</ParamField>

<ParamField path="settings" type="OpenAISTTService.Settings" default="None">
  Runtime-configurable settings for the STT service. See [Settings](#settings)
  below.
</ParamField>

<ParamField path="ttfs_p99_latency" type="float" default="OPENAI_TTFS_P99">
  P99 latency from speech end to final transcript in seconds. Override for your
  deployment.
</ParamField>

<ParamField path="push_empty_transcripts" type="bool" default="False">
  If true, allow empty `TranscriptionFrame` frames to be pushed downstream
  instead of discarding them. This is intended for situations where VAD fires
  even though the user did not speak. In these cases, it is useful to know that
  nothing was transcribed so that the agent can resume speaking, instead of
  waiting longer for a transcription.
</ParamField>

### Settings

Runtime-configurable settings passed via the `settings` constructor argument using `OpenAISTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.

| Parameter     | Type              | Default               | Description                                                              |
| ------------- | ----------------- | --------------------- | ------------------------------------------------------------------------ |
| `model`       | `str`             | `"gpt-4o-transcribe"` | Transcription model to use. *(Inherited from base STT settings.)*        |
| `language`    | `Language \| str` | `Language.EN`         | Language of the audio input. *(Inherited from base STT settings.)*       |
| `prompt`      | `str`             | `None`                | Optional text to guide the model's style or continue a previous segment. |
| `temperature` | `float`           | `None`                | Sampling temperature between 0 and 1.                                    |

### Usage

```python theme={null}
from pipecat.services.openai.stt import OpenAISTTService

stt = OpenAISTTService(
    api_key=os.getenv("OPENAI_API_KEY"),
    settings=OpenAISTTService.Settings(
        model="gpt-4o-transcribe",
    ),
)
```

### Notes

* **Segmented transcription**: Processes complete audio segments (after VAD detects silence) via HTTP. Only produces final transcriptions, not interim results.
* Does not have WebSocket connection events since it uses per-request HTTP calls.
* **Multilingual support**: Whisper and GPT-4o transcription models support many languages. The default is `Language.EN` (English). Set `language=None` in settings to enable automatic language detection, which will transcribe whatever language the user speaks.

## OpenAIRealtimeSTTService

Real-time streaming speech-to-text using OpenAI's Realtime API WebSocket transcription sessions. Audio is streamed continuously over a WebSocket connection for lower latency compared to HTTP-based transcription.

<ParamField path="api_key" type="str" required>
  OpenAI API key for authentication.
</ParamField>

<ParamField path="model" type="str" default="gpt-4o-transcribe" deprecated>
  Transcription model. Supported values are `"gpt-4o-transcribe"` and
  `"gpt-4o-mini-transcribe"`. *Deprecated in v0.0.105. Use
  `settings=OpenAIRealtimeSTTService.Settings(...)` instead.*
</ParamField>

<ParamField path="base_url" type="str" default="wss://api.openai.com/v1/realtime">
  WebSocket base URL for the Realtime API.
</ParamField>

<ParamField path="language" type="Language" default="Language.EN" deprecated>
  Language of the audio input. *Deprecated in v0.0.105. Use
  `settings=OpenAIRealtimeSTTService.Settings(...)` instead.*
</ParamField>

<ParamField path="prompt" type="str" default="None" deprecated>
  Optional prompt text to guide transcription style or provide keyword hints.
  *Deprecated in v0.0.105. Use `settings=OpenAIRealtimeSTTService.Settings(...)`
  instead.*
</ParamField>

<ParamField path="settings" type="OpenAIRealtimeSTTService.Settings" default="None">
  Runtime-configurable settings for the Realtime STT service. See
  [Settings](#settings-2) below.
</ParamField>

<ParamField path="turn_detection" type="dict | Literal[False]" default="False">
  Server-side VAD configuration. Defaults to `False` (disabled), which relies on a local VAD processor in the pipeline. Pass `None` to use server defaults (`server_vad`), or a dict with custom settings (e.g. `{"type": "server_vad", "threshold": 0.5}`).
</ParamField>

<ParamField path="noise_reduction" type="str" default="None">
  Noise reduction mode. `"near_field"` for close microphones, `"far_field"` for
  distant microphones, or `None` to disable.

  <Note>
    **Deprecated in v0.0.106.** Use
    `settings=OpenAIRealtimeSTTService.Settings(noise_reduction=...)` instead.
  </Note>
</ParamField>

<ParamField path="should_interrupt" type="bool" default="True">
  Whether to interrupt bot output when speech is detected by server-side VAD.
  Only applies when turn detection is enabled.
</ParamField>

<ParamField path="ttfs_p99_latency" type="float" default="OPENAI_REALTIME_TTFS_P99">
  P99 latency from speech end to final transcript in seconds. Override for your
  deployment.
</ParamField>

### Settings

Runtime-configurable settings passed via the `settings` constructor argument using `OpenAIRealtimeSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.

| Parameter  | Type              | Default               | Description                                                         |
| ---------- | ----------------- | --------------------- | ------------------------------------------------------------------- |
| `model`    | `str`             | `"gpt-4o-transcribe"` | Transcription model to use. *(Inherited from base STT settings.)*   |
| `language` | `Language \| str` | `Language.EN`         | Language of the audio input. *(Inherited from base STT settings.)*  |
| `prompt`   | `str`             | `None`                | Optional prompt text to guide transcription style or keyword hints. |

### Usage

#### With Local VAD

```python theme={null}
from pipecat.services.openai.stt import OpenAIRealtimeSTTService

# Local VAD mode (default) - use with a VAD processor in the pipeline
stt = OpenAIRealtimeSTTService(
    api_key=os.getenv("OPENAI_API_KEY"),
    settings=OpenAIRealtimeSTTService.Settings(
        model="gpt-4o-transcribe",
        noise_reduction="near_field",
    ),
)
```

#### With Server-Side VAD

```python theme={null}
from pipecat.services.openai.stt import OpenAIRealtimeSTTService

# Server-side VAD mode - do NOT use a separate VAD processor
stt = OpenAIRealtimeSTTService(
    api_key=os.getenv("OPENAI_API_KEY"),
    turn_detection=None,  # Enable server-side VAD
    settings=OpenAIRealtimeSTTService.Settings(
        model="gpt-4o-transcribe",
    ),
)
```

### Notes

* **Local VAD vs Server-side VAD**: Defaults to local VAD mode (`turn_detection=False`), where a local VAD processor in the pipeline controls when audio is committed for transcription. Set `turn_detection=None` for server-side VAD, but do not use a separate VAD processor in the pipeline in that mode.
* **Automatic resampling**: Automatically resamples audio to 24 kHz as required by the Realtime API, regardless of the pipeline's sample rate.
* **Interim transcriptions**: Produces interim transcriptions via delta events for real-time feedback.
* **Multilingual support**: GPT-4o transcription models support many languages. The default is `Language.EN` (English). Set `language=None` in settings to enable automatic language detection, which will transcribe whatever language the user speaks.

### Event Handlers

Supports the standard [service connection events](/api-reference/server/events/service-events):

| Event             | Description                                 |
| ----------------- | ------------------------------------------- |
| `on_connected`    | Connected to OpenAI Realtime WebSocket      |
| `on_disconnected` | Disconnected from OpenAI Realtime WebSocket |

```python theme={null}
@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to OpenAI Realtime STT")
```

<Tip>
  The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
  `Settings` / `settings=` instead. See the [Service Settings
  guide](/pipecat/fundamentals/service-settings) for migration details.
</Tip>
