Overview

AWSTranscribeSTTService provides real-time speech-to-text capabilities using Amazon Transcribe’s WebSocket API. It supports interim results, adjustable quality levels, and can handle continuous audio streams.

Installation

To use AWSTranscribeSTTService, install the required dependencies:

pip install "pipecat-ai[aws]"

You’ll also need to set up your AWS credentials as environment variables:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_SESSION_TOKEN (if using temporary credentials)
  • AWS_REGION (defaults to “us-east-1”)

You can obtain AWS credentials by setting up an IAM user with access to Amazon Transcribe in your AWS account.

Configuration

Constructor Parameters

api_key
str

Your AWS secret access key (can also use environment variable)

aws_access_key_id
str

Your AWS access key ID (can also use environment variable)

aws_session_token
str

Your AWS session token for temporary credentials (can also use environment variable)

region
str
default:"us-east-1"

AWS region to use for Transcribe service

sample_rate
int
default:"16000"

Audio sample rate in Hz (only 8000 Hz or 16000 Hz are supported)

language
Language
default:"Language.EN"

Language for transcription

Default Settings

{
    "sample_rate": 16000,
    "language": Language.EN,
    "media_encoding": "linear16",  # AWS expects raw PCM
    "number_of_channels": 1,
    "show_speaker_label": False,
    "enable_channel_identification": False
}

Input

The service processes InputAudioRawFrame instances containing:

  • Raw PCM audio data
  • 16-bit depth
  • 8kHz or 16kHz sample rate (will convert to 16kHz if another rate is provided)
  • Single channel (mono)

Output Frames

The service produces two types of frames during transcription:

TranscriptionFrame

Generated for final transcriptions, containing:

text
string

Transcribed text

user_id
string

User identifier

timestamp
string

ISO 8601 formatted timestamp

language
Language

Language used for transcription

InterimTranscriptionFrame

Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results.

Methods

See the STT base class methods for additional functionality.

Language Setting

await service.set_language(Language.FR)

Usage Example

from pipecat.services.aws.stt import AWSTranscribeSTTService

# Configure service using environment variables for credentials
stt = AWSTranscribeSTTService(
    region="us-west-2",
    sample_rate=16000,
    language=Language.EN
)

# Or provide credentials directly
stt = AWSTranscribeSTTService(
    aws_access_key_id="YOUR_ACCESS_KEY_ID",
    api_key="YOUR_SECRET_ACCESS_KEY",
    region="us-west-2",
    sample_rate=16000,
    language=Language.EN
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    llm,
    ...
])

Language Support

AWS Transcribe STT supports the following languages:

Language CodeDescriptionService Codes
Language.ENEnglish (US)en-US
Language.ESSpanishes-US
Language.FRFrenchfr-FR
Language.DEGermande-DE
Language.ITItalianit-IT
Language.PTPortuguese (Brazil)pt-BR
Language.JAJapaneseja-JP
Language.KOKoreanko-KR
Language.ZHChinese (Mandarin)zh-CN

AWS Transcribe supports additional languages and regional variants. See the AWS Transcribe documentation for a complete list.

Frame Flow

Metrics Support

The service supports the following metrics:

  • Time to First Byte (TTFB)
  • Processing duration

Notes

  • Requires valid AWS credentials with access to Amazon Transcribe
  • Supports real-time transcription with interim results
  • Handles WebSocket connection management and reconnection
  • Only supports mono audio (single channel)
  • Automatically handles audio format conversion to PCM
  • Manages connection lifecycle (start, stop, cancel)