Speech-to-text service implementation using AssemblyAI’s real-time transcription API
AssemblyAISTTService
provides real-time speech recognition using AssemblyAI’s WebSocket API with support for interim results, end-of-turn detection, and configurable audio processing parameters.
ASSEMBLYAI_API_KEY
.
InputAudioRawFrame
- Raw PCM audio data (16-bit, 16kHz, mono)UserStartedSpeakingFrame
- VAD start signal (triggers TTFB metrics)UserStoppedSpeakingFrame
- VAD stop signal (triggers force endpoint if enabled)STTUpdateSettingsFrame
- Runtime transcription configuration updatesSTTMuteFrame
- Mute audio input for transcriptionInterimTranscriptionFrame
- Real-time transcription updatesTranscriptionFrame
- Final transcription resultsTranslationFrame
- Translated text (if translation is enabled)ErrorFrame
- Connection or processing errorsAssemblyAISTTService
and use it in a pipeline:
STTUpdateSettingsFrame
for the AssemblyAISTTService
: