AssemblyAI
Speech-to-text service implementation using AssemblyAI’s real-time transcription API
Overview
AssemblyAISTTService
provides real-time speech-to-text capabilities using AssemblyAI’s WebSocket API. It supports streaming transcription with both interim and final results.
Installation
To use AssemblyAISTTService
, install the required dependencies:
You’ll also need to set up your AssemblyAI API key as an environment variable: ASSEMBLYAI_API_KEY
.
You can obtain a AssemblyAI API key by signing up at AssemblyAI.
Configuration
Constructor Parameters
Your AssemblyAI API key
Audio sample rate in Hz
Audio encoding format
Transcription language (currently only English supported for real-time)
Input
The service processes raw audio data with the following requirements:
- PCM audio format
- 16-bit depth
- 16kHz sample rate (default)
- Single channel (mono)
Output Frames
The service produces two types of frames during transcription:
TranscriptionFrame
Generated for final transcriptions, containing:
Transcribed text
User identifier
ISO 8601 formatted timestamp
Transcription language
InterimTranscriptionFrame
Generated during ongoing speech, containing the same fields as TranscriptionFrame but with preliminary results.
ErrorFrame
Generated when transcription errors occur, containing error details.
Methods
See the STT base class methods for additional functionality.
Language Setting
Language Support
Assembly STT supports the following languages and regional variants:
Language Code | Description | Service Codes |
---|---|---|
Language.EN | English | en |
Usage Example
Frame Flow
Metrics Support
The service collects processing metrics:
- Time to First Byte (TTFB)
- Processing duration
- Connection status
Notes
- Currently supports English-only real-time transcription
- Handles WebSocket connection management
- Provides both interim and final transcriptions
- Thread-safe processing with proper event loop handling
- Automatic error handling and reporting
- Manages connection lifecycle