Overview

DeepgramSTTService provides real-time speech-to-text capabilities using Deepgram’s WebSocket API. It supports interim results, language detection, and voice activity detection (VAD).

Installation

To use DeepgramSTTService, install the required dependencies:

pip install pipecat-ai[deepgram]

You’ll also need to set up your Deepgram API key as an environment variable: DEEPGRAM_API_KEY.

You can obtain a Deepgram API key by signing up at Deepgram.

Configuration

Constructor Parameters

api_key
str
required

Your Deepgram API key

url
str
default:
""

Custom Deepgram API endpoint URL

live_options
LiveOptions

Custom transcription options

Default Options

LiveOptions(
    encoding="linear16",
    language=Language.EN,
    model="nova-2-general",
    sample_rate=16000,
    channels=1,
    interim_results=True,
    smart_format=True,
    punctuate=True,
    profanity_filter=True,
    vad_events=False
)

Input

The service processes InputAudioRawFrame instances containing:

  • Raw PCM audio data
  • 16-bit depth
  • 16kHz sample rate
  • Single channel (mono)

Output Frames

The service produces two types of frames during transcription:

TranscriptionFrame

Generated for final transcriptions, containing:

text
string

Transcribed text

user_id
string

User identifier

timestamp
string

ISO 8601 formatted timestamp

language
Language

Detected language (if available)

InterimTranscriptionFrame

Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results.

Methods

See the STT base class methods for additional functionality.

Language Setting

await service.set_language(Language.FR)

Model Selection

await service.set_model("nova-2-general")

Usage Example

from pipecat.services.deepgram import DeepgramSTTService
from deepgram import LiveOptions

# Configure service
stt_service = DeepgramSTTService(
    api_key="your-api-key",
    live_options=LiveOptions(
        model="nova-2-general",
        language="en-US",
        smart_format=True,
        vad_events=True
    )
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),   # Produces InputAudioRawFrame
    stt_service,         # Processes audio → produces transcription frames
    text_handler         # Consumes transcription frames
])

Language Support

Deepgram STT supports the following languages and regional variants:

Language CodeDescriptionService Codes
Language.BGBulgarianbg
Language.CACatalanca
Language.ZHChinese (Mandarin, Simplified)zh, zh-CN, zh-Hans
Language.ZH_TWChinese (Mandarin, Traditional)zh-TW, zh-Hant
Language.ZH_HKChinese (Cantonese, Traditional)zh-HK
Language.CSCzechcs
Language.DADanishda, da-DK
Language.NLDutchnl
Language.NL_BEDutch (Flemish)nl-BE
Language.ENEnglishen
Language.EN_USEnglish (US)en-US
Language.EN_AUEnglish (Australia)en-AU
Language.EN_GBEnglish (UK)en-GB
Language.EN_NZEnglish (New Zealand)en-NZ
Language.EN_INEnglish (India)en-IN
Language.ETEstonianet
Language.FIFinnishfi
Language.FRFrenchfr
Language.FR_CAFrench (Canada)fr-CA
Language.DEGermande
Language.DE_CHGerman (Switzerland)de-CH
Language.ELGreekel
Language.HIHindihi
Language.HUHungarianhu
Language.IDIndonesianid
Language.ITItalianit
Language.JAJapaneseja
Language.KOKoreanko, ko-KR
Language.LVLatvianlv
Language.LTLithuanianlt
Language.MSMalayms
Language.NONorwegianno
Language.PLPolishpl
Language.PTPortuguesept
Language.PT_BRPortuguese (Brazil)pt-BR
Language.PT_PTPortuguese (Portugal)pt-PT
Language.RORomanianro
Language.RURussianru
Language.SKSlovaksk
Language.ESSpanishes, es-419
Language.SVSwedishsv, sv-SE
Language.THThaith, th-TH
Language.TRTurkishtr
Language.UKUkrainianuk
Language.VIVietnamesevi

Special Features

  • Supports multilingual transcription (Spanish + English) using multi
  • Provides multiple regional variants for major languages
  • Supports traditional and simplified Chinese scripts

Usage Example

# Configure service with specific language
stt_service = DeepgramSTTService(
    api_key="your-api-key",
    live_options=LiveOptions(
        language="en-US",  # Specific regional variant
        model="nova-2-general"
    )
)

Note: Language support may vary by model. Check Deepgram’s documentation for model-specific language availability.

Frame Flow

Metrics Support

The service supports metrics collection when VAD is enabled:

  • Time to First Byte (TTFB)
  • Processing duration
  • Speech detection events

Notes

  • Requires valid Deepgram API key
  • Supports real-time transcription
  • Handles WebSocket connection management
  • Provides language detection
  • Supports model switching
  • Includes VAD capabilities
  • Manages connection lifecycle