AWS Transcribe
Speech-to-text service implementation using Amazon Transcribe’s real-time transcription API
Overview
AWSTranscribeSTTService
provides real-time speech-to-text capabilities using Amazon Transcribe’s WebSocket API. It supports interim results, adjustable quality levels, and can handle continuous audio streams.
Installation
To use AWSTranscribeSTTService
, install the required dependencies:
You’ll also need to set up your AWS credentials as environment variables:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN
(if using temporary credentials)AWS_REGION
(defaults to “us-east-1”)
You can obtain AWS credentials by setting up an IAM user with access to Amazon Transcribe in your AWS account.
Configuration
Constructor Parameters
Your AWS secret access key (can also use environment variable)
Your AWS access key ID (can also use environment variable)
Your AWS session token for temporary credentials (can also use environment variable)
AWS region to use for Transcribe service
Audio sample rate in Hz (only 8000 Hz or 16000 Hz are supported)
Language for transcription
Default Settings
Input
The service processes InputAudioRawFrame
instances containing:
- Raw PCM audio data
- 16-bit depth
- 8kHz or 16kHz sample rate (will convert to 16kHz if another rate is provided)
- Single channel (mono)
Output Frames
The service produces two types of frames during transcription:
TranscriptionFrame
Generated for final transcriptions, containing:
Transcribed text
User identifier
ISO 8601 formatted timestamp
Language used for transcription
InterimTranscriptionFrame
Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results.
Methods
See the STT base class methods for additional functionality.
Language Setting
Usage Example
Language Support
AWS Transcribe STT supports the following languages:
Language Code | Description | Service Codes |
---|---|---|
Language.EN | English (US) | en-US |
Language.ES | Spanish | es-US |
Language.FR | French | fr-FR |
Language.DE | German | de-DE |
Language.IT | Italian | it-IT |
Language.PT | Portuguese (Brazil) | pt-BR |
Language.JA | Japanese | ja-JP |
Language.KO | Korean | ko-KR |
Language.ZH | Chinese (Mandarin) | zh-CN |
AWS Transcribe supports additional languages and regional variants. See the AWS Transcribe documentation for a complete list.
Frame Flow
Metrics Support
The service supports the following metrics:
- Time to First Byte (TTFB)
- Processing duration
Notes
- Requires valid AWS credentials with access to Amazon Transcribe
- Supports real-time transcription with interim results
- Handles WebSocket connection management and reconnection
- Only supports mono audio (single channel)
- Automatically handles audio format conversion to PCM
- Manages connection lifecycle (start, stop, cancel)