Overview

AWSTranscribeSTTService provides real-time speech recognition using Amazon Transcribe’s WebSocket streaming API with support for interim results, multiple languages, and configurable audio processing parameters.

Installation

To use AWS Transcribe services, install the required dependency:
pip install "pipecat-ai[aws]"
You’ll also need to set up your AWS credentials as environment variables:
  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_SESSION_TOKEN (if using temporary credentials)
  • AWS_REGION (defaults to “us-east-1”)
Get your AWS credentials by setting up an IAM user with Amazon Transcribe access in your AWS Console.

Frames

Input

  • InputAudioRawFrame - Raw PCM audio data (16-bit, 8kHz or 16kHz, mono)
  • STTUpdateSettingsFrame - Runtime transcription configuration updates
  • STTMuteFrame - Mute audio input for transcription

Output

  • InterimTranscriptionFrame - Real-time transcription updates
  • TranscriptionFrame - Final transcription results
  • ErrorFrame - Connection or processing errors

Language Support

AWS Transcribe supports multiple languages with regional variants:
Language CodeDescriptionService Codes
Language.ENEnglish (US)en-US
Language.ESSpanishes-US
Language.FRFrenchfr-FR
Language.DEGermande-DE
Language.ITItalianit-IT
Language.PTPortuguese (Brazil)pt-BR
Language.JAJapaneseja-JP
Language.KOKoreanko-KR
Language.ZHChinese (Mandarin)zh-CN
Language.PLPolishpl-PL
AWS Transcribe supports additional languages and regional variants. See the AWS documentation for a complete list.

Usage Example

Basic Configuration

Initialize the AWSTranscribeSTTService and use it in a pipeline:
from pipecat.services.aws.stt import AWSTranscribeSTTService
from pipecat.transcriptions.language import Language

# Configuration with explicit credentials
stt = AWSTranscribeSTTService(
    aws_access_key_id="YOUR_ACCESS_KEY_ID",
    api_key="YOUR_SECRET_ACCESS_KEY",
    aws_session_token="YOUR_SESSION_TOKEN",  # If using temporary credentials
    region="us-west-2",
    language=Language.FR
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

Dynamic Configuration

Make settings updates by pushing an STTUpdateSettingsFrame for the AWSTranscribeSTTService:
from pipecat.frames.frames import STTUpdateSettingsFrame

await task.queue_frame(STTUpdateSettingsFrame(
    language=Language.FR,
))

Metrics

The service provides:
  • Time to First Byte (TTFB) - Latency from audio input to first transcription
  • Processing Duration - Total time spent processing audio
Learn how to enable Metrics in your Pipeline.

Additional Notes

  • Sample Rate: Supports 8kHz and 16kHz sample rates. Other rates are automatically resampled to 16kHz.
  • Connection Management: Handles WebSocket connections with automatic reconnection and proper connection state management.
  • Credentials: Supports both environment variables and explicit credential parameters for flexible deployment
  • Error Handling: Comprehensive error handling with graceful degradation and connection recovery