Speech-to-text service implementation using Amazon Transcribe’s real-time transcription API
AWSTranscribeSTTService
provides real-time speech recognition using Amazon Transcribe’s WebSocket streaming API with support for interim results, multiple languages, and configurable audio processing parameters.
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN
(if using temporary credentials)AWS_REGION
(defaults to “us-east-1”)InputAudioRawFrame
- Raw PCM audio data (16-bit, 8kHz or 16kHz, mono)STTUpdateSettingsFrame
- Runtime transcription configuration updatesSTTMuteFrame
- Mute audio input for transcriptionInterimTranscriptionFrame
- Real-time transcription updatesTranscriptionFrame
- Final transcription resultsErrorFrame
- Connection or processing errorsLanguage Code | Description | Service Codes |
---|---|---|
Language.EN | English (US) | en-US |
Language.ES | Spanish | es-US |
Language.FR | French | fr-FR |
Language.DE | German | de-DE |
Language.IT | Italian | it-IT |
Language.PT | Portuguese (Brazil) | pt-BR |
Language.JA | Japanese | ja-JP |
Language.KO | Korean | ko-KR |
Language.ZH | Chinese (Mandarin) | zh-CN |
Language.PL | Polish | pl-PL |
AWSTranscribeSTTService
and use it in a pipeline:
STTUpdateSettingsFrame
for the AWSTranscribeSTTService
: