Skip to main content

Overview

Deepgram provides two STT service implementations:
  • DeepgramSTTService for real-time speech recognition using Deepgram’s standard WebSocket API with support for interim results, language detection, and voice activity detection (VAD)
  • DeepgramFluxSTTService for advanced conversational AI with Flux capabilities including intelligent turn detection, eager end-of-turn events, and enhanced speech processing for improved response timing.
Since Deepgram Flux provides its own user turn start and end detection, you should use ExternalUserTurnStrategies to let Flux handle turn management. See User Turn Strategies for configuration details.

Installation

To use Deepgram services, install the required dependencies:
pip install "pipecat-ai[deepgram]"

Prerequisites

Deepgram Account Setup

Before using Deepgram STT services, you need:
  1. Deepgram Account: Sign up at Deepgram Console
  2. API Key: Generate an API key from your console dashboard
  3. Model Selection: Choose from available transcription models and features

Required Environment Variables

  • DEEPGRAM_API_KEY: Your Deepgram API key for authentication

Configuration

DeepgramSTTService

api_key
str
required
Deepgram API key for authentication.
base_url
str
default:"\"\""
Custom Deepgram API base URL. Leave empty for the default endpoint.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the value from live_options or the pipeline’s configured sample rate.
live_options
LiveOptions
default:"None"
Deepgram LiveOptions for detailed configuration. When provided, these settings are merged with the defaults. See Deepgram LiveOptions for available options.
addons
Dict
default:"None"
Additional Deepgram features to enable.
ttfs_p99_latency
float
default:"DEEPGRAM_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment.
The default LiveOptions are:
OptionDefaultDescription
encoding"linear16"Audio encoding format.
languageLanguage.ENRecognition language.
model"nova-3-general"Deepgram model to use.
channels1Number of audio channels.
interim_resultsTrueStream partial recognition results.
smart_formatFalseApply smart formatting.
punctuateTrueAdd punctuation to transcripts.
profanity_filterTrueFilter profanity from transcripts.
vad_eventsFalseEnable Deepgram’s built-in VAD events (deprecated).

DeepgramFluxSTTService

api_key
str
required
Deepgram API key for authentication.
url
str
default:"wss://api.deepgram.com/v2/listen"
WebSocket URL for the Deepgram Flux API.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
model
str
default:"flux-general-en"
Deepgram Flux model to use for transcription.
flux_encoding
str
default:"linear16"
Audio encoding format required by the Flux API. Must be "linear16".
params
InputParams
default:"None"
Configuration parameters for the Flux API. See Flux InputParams below.
should_interrupt
bool
default:"True"
Whether the bot should be interrupted when Flux detects user speech.

Flux InputParams

Parameters passed via the params constructor argument for DeepgramFluxSTTService.
ParameterTypeDefaultDescription
eager_eot_thresholdfloatNoneEagerEndOfTurn threshold. Lower values trigger faster responses with more LLM calls; higher values are more conservative. None disables EagerEndOfTurn.
eot_thresholdfloatNoneEnd-of-turn confidence threshold (default 0.7). Lower = faster turn endings.
eot_timeout_msintNoneTime in ms after speech to finish a turn regardless of confidence (default 5000).
keytermlist[]Key terms to boost recognition accuracy for specialized terminology.
mip_opt_outboolNoneOpt out of Deepgram’s Model Improvement Program.
taglist[]Tags for request identification during usage reporting.
min_confidencefloatNoneMinimum average confidence required to produce a TranscriptionFrame.

Usage

Basic DeepgramSTTService

from pipecat.services.deepgram import DeepgramSTTService

stt = DeepgramSTTService(
    api_key=os.getenv("DEEPGRAM_API_KEY"),
)

With Custom LiveOptions

from deepgram import LiveOptions
from pipecat.services.deepgram import DeepgramSTTService

stt = DeepgramSTTService(
    api_key=os.getenv("DEEPGRAM_API_KEY"),
    live_options=LiveOptions(
        model="nova-3-general",
        language="es",
        punctuate=True,
        smart_format=True,
    ),
)

DeepgramFluxSTTService

from pipecat.services.deepgram.flux import DeepgramFluxSTTService

stt = DeepgramFluxSTTService(
    api_key=os.getenv("DEEPGRAM_API_KEY"),
)

Flux with EagerEndOfTurn

from pipecat.services.deepgram.flux import DeepgramFluxSTTService

stt = DeepgramFluxSTTService(
    api_key=os.getenv("DEEPGRAM_API_KEY"),
    params=DeepgramFluxSTTService.InputParams(
        eager_eot_threshold=0.5,
        eot_threshold=0.8,
        keyterm=["Pipecat", "Deepgram"],
    ),
)

Notes

  • Finalize on VAD stop: When the pipeline’s VAD detects the user has stopped speaking, DeepgramSTTService sends a finalize request to Deepgram for faster final transcript delivery.
  • Flux turn management: DeepgramFluxSTTService provides its own turn detection via StartOfTurn/EndOfTurn events and broadcasts UserStartedSpeakingFrame/UserStoppedSpeakingFrame directly. Use ExternalUserTurnStrategies to avoid conflicting VAD-based turn management.
  • EagerEndOfTurn: In Flux, enabling eager_eot_threshold provides faster response times by predicting end-of-turn before it is confirmed. EagerEndOfTurn transcripts are pushed as InterimTranscriptionFrames. If the user resumes speaking, a TurnResumed event is fired.
  • Deprecated vad_events: The vad_events option in standard DeepgramSTTService is deprecated. Use Silero VAD instead.

Event Handlers

In addition to the standard service connection events (on_connected, on_disconnected, on_connection_error), Deepgram STT provides:

DeepgramSTTService

EventDescription
on_speech_startedSpeech detected in the audio stream
on_utterance_endEnd of utterance detected by Deepgram
@stt.event_handler("on_speech_started")
async def on_speech_started(service):
    print("User started speaking")

@stt.event_handler("on_utterance_end")
async def on_utterance_end(service):
    print("Utterance ended")

DeepgramFluxSTTService

Deepgram Flux provides turn-level events for more granular conversation tracking:
EventDescription
on_start_of_turnStart of a new turn detected
on_turn_resumedA previously paused turn has resumed
on_end_of_turnEnd of turn detected
on_eager_end_of_turnEarly end-of-turn prediction
on_updateTranscript updated
@stt.event_handler("on_start_of_turn")
async def on_start_of_turn(service, transcript):
    print(f"Turn started: {transcript}")

@stt.event_handler("on_end_of_turn")
async def on_end_of_turn(service, transcript):
    print(f"Turn ended: {transcript}")

@stt.event_handler("on_eager_end_of_turn")
async def on_eager_end_of_turn(service, transcript):
    print(f"Early end-of-turn prediction: {transcript}")
Turn events receive (service, transcript) where transcript is the current transcript text. The on_turn_resumed event receives only (service).