Speech-to-Text
Deepgram
Speech-to-text service implementation using Deepgram’s real-time transcription API
Overview
DeepgramSTTService
provides real-time speech recognition using Deepgram’s WebSocket API with support for interim results, language detection, and voice activity detection (VAD).
API Reference
Complete API documentation and method details
Deepgram Docs
Official Deepgram documentation and features
Example Code
Working example with interruption handling
Installation
To use DeepgramSTTService
, install the required dependencies:
You’ll also need to set up your Deepgram API key as an environment variable: DEEPGRAM_API_KEY
.
Get your API key from the Deepgram Console.
Frames
Input
InputAudioRawFrame
- Raw PCM audio data (16-bit, 16kHz, mono)STTUpdateSettingsFrame
- Runtime transcription configuration updatesSTTMuteFrame
- Mute audio input for transcription
Output
InterimTranscriptionFrame
- Real-time transcription updatesTranscriptionFrame
- Final transcription resultsErrorFrame
- Connection or processing errors
Language Support
Deepgram STT supports the following languages and regional variants:
Language Code | Description | Service Codes |
---|---|---|
Language.BG | Bulgarian | bg |
Language.CA | Catalan | ca |
Language.ZH | Chinese (Mandarin, Simplified) | zh , zh-CN , zh-Hans |
Language.ZH_TW | Chinese (Mandarin, Traditional) | zh-TW , zh-Hant |
Language.ZH_HK | Chinese (Cantonese, Traditional) | zh-HK |
Language.CS | Czech | cs |
Language.DA | Danish | da , da-DK |
Language.NL | Dutch | nl |
Language.NL_BE | Dutch (Flemish) | nl-BE |
Language.EN | English | en |
Language.EN_US | English (US) | en-US |
Language.EN_AU | English (Australia) | en-AU |
Language.EN_GB | English (UK) | en-GB |
Language.EN_NZ | English (New Zealand) | en-NZ |
Language.EN_IN | English (India) | en-IN |
Language.ET | Estonian | et |
Language.FI | Finnish | fi |
Language.FR | French | fr |
Language.FR_CA | French (Canada) | fr-CA |
Language.DE | German | de |
Language.DE_CH | German (Switzerland) | de-CH |
Language.EL | Greek | el |
Language.HI | Hindi | hi |
Language.HU | Hungarian | hu |
Language.ID | Indonesian | id |
Language.IT | Italian | it |
Language.JA | Japanese | ja |
Language.KO | Korean | ko , ko-KR |
Language.LV | Latvian | lv |
Language.LT | Lithuanian | lt |
Language.MS | Malay | ms |
Language.NO | Norwegian | no |
Language.PL | Polish | pl |
Language.PT | Portuguese | pt |
Language.PT_BR | Portuguese (Brazil) | pt-BR |
Language.PT_PT | Portuguese (Portugal) | pt-PT |
Language.RO | Romanian | ro |
Language.RU | Russian | ru |
Language.SK | Slovak | sk |
Language.ES | Spanish | es , es-419 |
Language.SV | Swedish | sv , sv-SE |
Language.TH | Thai | th , th-TH |
Language.TR | Turkish | tr |
Language.UK | Ukrainian | uk |
Language.VI | Vietnamese | vi |
Usage Example
Metrics
The service provides:
- Time to First Byte (TTFB) - Latency from audio input to first transcription
- Processing Duration - Total time spent processing audio
Additional Notes
- Connection Management: Automatically handles WebSocket connections and reconnections
- VAD Integration: Supports Deepgram’s built-in VAD, though we recommend using local VAD services like Silero for better performance
- Sample Rate: Can be configured per service, but we recommend setting it globally in
PipelineParams
for consistency across services