Overview

OpenAISTTService provides speech-to-text capabilities using OpenAI’s hosted Whisper API. It offers high-accuracy transcription with minimal setup requirements.

Installation

To use OpenAISTTService, install the required dependencies:

pip install pipecat-ai[openai]

You’ll need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY.

You can obtain an OpenAI API key from the OpenAI platform.

Configuration

Constructor Parameters

model
str
default:
"whisper-1"

Whisper model to use. Currently only “whisper-1” is available.

api_key
str

Your OpenAI API key. If not provided, will use environment variable.

base_url
str

Custom API base URL for OpenAI API requests.

Input

The service processes audio data with the following requirements:

  • PCM audio format
  • 16-bit depth
  • Single channel (mono)

Output Frames

The service produces two types of frames during transcription:

TranscriptionFrame

Generated for final transcriptions, containing:

text
string

Transcribed text

user_id
string

User identifier

timestamp
string

ISO 8601 formatted timestamp

language
Language

Detected language (if available)

ErrorFrame

Generated when transcription errors occur, containing error details.

Methods

Set Model

await service.set_model("whisper-1")

See the STT base class methods for additional functionality.

Usage Example

from pipecat.services.openai import OpenAISTTService

# Configure service
stt_service = OpenAISTTService(
    model="whisper-1",
    api_key="your-api-key"
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    llm,
    ...
])

Frame Flow

Metrics Support

The service collects the following metrics:

  • Time to First Byte (TTFB)
  • Processing duration
  • API response time

Notes

  • Requires valid OpenAI API key
  • Uses OpenAI’s hosted Whisper model
  • Handles API rate limiting
  • Automatic error handling
  • Thread-safe processing

Error Handling

The service handles common API errors including:

  • Authentication errors
  • Rate limiting
  • Invalid audio format
  • Network connectivity issues
  • API timeouts

Errors are propagated through ErrorFrames with descriptive messages.