Overview

DeepgramSTTService provides real-time speech recognition using Deepgram’s WebSocket API with support for interim results, language detection, and voice activity detection (VAD).

Installation

To use DeepgramSTTService, install the required dependencies:

pip install "pipecat-ai[deepgram]"

You’ll also need to set up your Deepgram API key as an environment variable: DEEPGRAM_API_KEY.

Get your API key from the Deepgram Console.

Frames

Input

  • InputAudioRawFrame - Raw PCM audio data (16-bit, 16kHz, mono)
  • STTUpdateSettingsFrame - Runtime transcription configuration updates
  • STTMuteFrame - Mute audio input for transcription

Output

  • InterimTranscriptionFrame - Real-time transcription updates
  • TranscriptionFrame - Final transcription results
  • ErrorFrame - Connection or processing errors

Language Support

Deepgram STT supports the following languages and regional variants:

Language CodeDescriptionService Codes
Language.BGBulgarianbg
Language.CACatalanca
Language.ZHChinese (Mandarin, Simplified)zh, zh-CN, zh-Hans
Language.ZH_TWChinese (Mandarin, Traditional)zh-TW, zh-Hant
Language.ZH_HKChinese (Cantonese, Traditional)zh-HK
Language.CSCzechcs
Language.DADanishda, da-DK
Language.NLDutchnl
Language.NL_BEDutch (Flemish)nl-BE
Language.ENEnglishen
Language.EN_USEnglish (US)en-US
Language.EN_AUEnglish (Australia)en-AU
Language.EN_GBEnglish (UK)en-GB
Language.EN_NZEnglish (New Zealand)en-NZ
Language.EN_INEnglish (India)en-IN
Language.ETEstonianet
Language.FIFinnishfi
Language.FRFrenchfr
Language.FR_CAFrench (Canada)fr-CA
Language.DEGermande
Language.DE_CHGerman (Switzerland)de-CH
Language.ELGreekel
Language.HIHindihi
Language.HUHungarianhu
Language.IDIndonesianid
Language.ITItalianit
Language.JAJapaneseja
Language.KOKoreanko, ko-KR
Language.LVLatvianlv
Language.LTLithuanianlt
Language.MSMalayms
Language.NONorwegianno
Language.PLPolishpl
Language.PTPortuguesept
Language.PT_BRPortuguese (Brazil)pt-BR
Language.PT_PTPortuguese (Portugal)pt-PT
Language.RORomanianro
Language.RURussianru
Language.SKSlovaksk
Language.ESSpanishes, es-419
Language.SVSwedishsv, sv-SE
Language.THThaith, th-TH
Language.TRTurkishtr
Language.UKUkrainianuk
Language.VIVietnamesevi

Usage Example

from pipecat.services.deepgram.stt import DeepgramSTTService
from deepgram import LiveOptions
from pipecat.transcriptions.language import Language

# Configure service
stt = DeepgramSTTService(
    api_key="your-api-key",
    live_options=LiveOptions(
        model="nova-3-general",
        language="en-US",
        smart_format=True
    )
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    llm,
    tts,
    transport.output()
])

# Change language dynamically
await stt.set_language(Language.FR)

Metrics

The service provides:

  • Time to First Byte (TTFB) - Latency from audio input to first transcription
  • Processing Duration - Total time spent processing audio

Additional Notes

  • Connection Management: Automatically handles WebSocket connections and reconnections
  • VAD Integration: Supports Deepgram’s built-in VAD, though we recommend using local VAD services like Silero for better performance
  • Sample Rate: Can be configured per service, but we recommend setting it globally in PipelineParams for consistency across services