Skip to main content

Overview

Deepgram provides three STT service implementations:
  • DeepgramSTTService for real-time speech recognition using Deepgram’s standard WebSocket API with support for interim results, language detection, and voice activity detection (VAD)
  • DeepgramFluxSTTService for advanced conversational AI with Flux capabilities including intelligent turn detection, eager end-of-turn events, and enhanced speech processing for improved response timing
  • DeepgramSageMakerSTTService for real-time speech recognition using Deepgram models deployed on AWS SageMaker endpoints via HTTP/2 bidirectional streaming

Installation

To use Deepgram STT services, install the required dependencies:
pip install "pipecat-ai[deepgram]"
For the SageMaker variant, install both the Deepgram and SageMaker dependencies:
pip install "pipecat-ai[deepgram,sagemaker]"

Prerequisites

Deepgram Account Setup

Before using DeepgramSTTService or DeepgramFluxSTTService, you need:
  1. Deepgram Account: Sign up at Deepgram Console
  2. API Key: Generate an API key from your console dashboard
  3. Model Selection: Choose from available transcription models and features

Required Environment Variables

  • DEEPGRAM_API_KEY: Your Deepgram API key for authentication

AWS SageMaker Setup

Before using DeepgramSageMakerSTTService, you need:
  1. AWS Account: With credentials configured (via environment variables, AWS CLI, or instance metadata)
  2. SageMaker Endpoint: A deployed SageMaker endpoint with a Deepgram model
  3. Deepgram SDK: The Deepgram SDK may be needed for certain advanced configurations

DeepgramSTTService

api_key
str
required
Deepgram API key for authentication.
base_url
str
default:"\"\""
Custom Deepgram API base URL. Leave empty for the default endpoint.
encoding
str
default:"linear16"
Audio encoding format.
channels
int
default:"1"
Number of audio channels.
multichannel
bool
default:"False"
Transcribe each audio channel independently.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
callback
str
default:"None"
Callback URL for async transcription delivery.
callback_method
str
default:"None"
HTTP method for the callback ("GET" or "POST").
tag
Any
default:"None"
Custom billing tag.
mip_opt_out
bool
default:"None"
Opt out of the Deepgram Model Improvement Program.
live_options
LiveOptions
default:"None"
deprecated
Legacy configuration options. Deprecated in v0.0.105. Use settings=DeepgramSTTService.Settings(...) for runtime-updatable fields and direct constructor parameters for connection-level config instead.
settings
DeepgramSTTService.Settings
default:"None"
Runtime-configurable settings for the STT service. See Settings below.
addons
Dict
default:"None"
Additional Deepgram features to enable.
should_interrupt
bool
default:"True"
deprecated
Whether to interrupt the bot when Deepgram VAD detects user speech. Deprecated in v0.0.99. Will be removed along with vad_events support.
ttfs_p99_latency
float
default:"DEEPGRAM_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment.

Settings

Runtime-configurable settings passed via the settings constructor argument using DeepgramSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstr"nova-3-general"Deepgram model to use. (Inherited from base STT settings.)
languageLanguage | strLanguage.ENRecognition language. (Inherited from base STT settings.)
detect_entitiesboolFalseEnable named entity detection.
diarizeboolFalseEnable speaker diarization.
dictationboolFalseEnable dictation mode (converts commands to punctuation).
endpointingint | boolNoneEndpointing sensitivity in ms, or False to disable.
interim_resultsboolTrueStream partial recognition results.
keytermstr | listNoneKeyterms to boost recognition accuracy.
keywordsstr | listNoneKeywords to boost (str or list of str).
numeralsboolFalseConvert spoken numbers to numerals.
profanity_filterboolTrueFilter profanity from transcripts.
punctuateboolTrueAdd punctuation to transcripts.
redactstr | listNoneRedact sensitive information.
replacestr | listNoneWord replacement rules.
searchstr | listNoneSearch terms to highlight.
smart_formatboolFalseApply smart formatting to transcripts.
utterance_end_msintNoneSilence duration in ms before an utterance-end event.
vad_eventsboolFalseEnable Deepgram’s built-in VAD events (deprecated).

Usage

from pipecat.services.deepgram.stt import DeepgramSTTService

stt = DeepgramSTTService(
    api_key=os.getenv("DEEPGRAM_API_KEY"),
)

With Custom Settings

from pipecat.services.deepgram.stt import DeepgramSTTService

stt = DeepgramSTTService(
    api_key=os.getenv("DEEPGRAM_API_KEY"),
    settings=DeepgramSTTService.Settings(
        model="nova-3-general",
        language="es",
        punctuate=True,
        smart_format=True,
    ),
)

Notes

  • Finalize on VAD stop: When the pipeline’s VAD detects the user has stopped speaking, the service sends a finalize request to Deepgram for faster final transcript delivery.
  • Deprecated vad_events: The vad_events setting is deprecated. Use Silero VAD instead.

Event Handlers

Supports the standard service connection events (on_connected, on_disconnected, on_connection_error), plus:
EventDescription
on_speech_startedSpeech detected in the audio stream
on_utterance_endEnd of utterance detected by Deepgram
@stt.event_handler("on_speech_started")
async def on_speech_started(service):
    print("User started speaking")

@stt.event_handler("on_utterance_end")
async def on_utterance_end(service):
    print("Utterance ended")

DeepgramFluxSTTService

Since Deepgram Flux provides its own user turn start and end detection, you should use ExternalUserTurnStrategies to let Flux handle turn management. See User Turn Strategies for configuration details.
api_key
str
required
Deepgram API key for authentication.
url
str
default:"wss://api.deepgram.com/v2/listen"
WebSocket URL for the Deepgram Flux API.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
model
str
default:"flux-general-en"
deprecated
Deepgram Flux model to use for transcription. Deprecated in v0.0.105. Use settings=DeepgramFluxSTTService.Settings(...) instead.
mip_opt_out
bool
default:"None"
Opt out of the Deepgram Model Improvement Program.
flux_encoding
str
default:"linear16"
Audio encoding format required by the Flux API. Must be "linear16".
tag
list
default:"None"
Tags to label requests for identification during usage reporting.
params
InputParams
default:"None"
deprecated
Legacy configuration options. Deprecated in v0.0.105. Use settings=DeepgramFluxSTTService.Settings(...) instead.
settings
DeepgramFluxSTTService.Settings
default:"None"
Configuration settings for the Flux API. See Settings below.
should_interrupt
bool
default:"True"
Whether the bot should be interrupted when Flux detects user speech.

Settings

Runtime-configurable settings passed via the settings constructor argument using DeepgramFluxSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescriptionOn-the-fly
modelstr"flux-general-en"Deepgram Flux model to use. (Inherited from base STT settings.)
languageLanguage | strNoneRecognition language. (Inherited from base STT settings.)
eager_eot_thresholdfloatNoneEagerEndOfTurn threshold. Lower values trigger faster responses with more LLM calls; higher values are more conservative. None disables EagerEndOfTurn.
eot_thresholdfloatNoneEnd-of-turn confidence threshold (default 0.7). Lower = faster turn endings.
eot_timeout_msintNoneTime in ms after speech to finish a turn regardless of confidence (default 5000).
keytermlist[]Key terms to boost recognition accuracy for specialized terminology.
min_confidencefloatNoneMinimum average confidence required to produce a TranscriptionFrame.
Parameters marked with ✓ in the “On-the-fly” column can be updated mid-stream using STTUpdateSettingsFrame without requiring a WebSocket reconnect.

Usage

from pipecat.services.deepgram.flux import DeepgramFluxSTTService

stt = DeepgramFluxSTTService(
    api_key=os.getenv("DEEPGRAM_API_KEY"),
)

With EagerEndOfTurn

from pipecat.services.deepgram.flux import DeepgramFluxSTTService

stt = DeepgramFluxSTTService(
    api_key=os.getenv("DEEPGRAM_API_KEY"),
    settings=DeepgramFluxSTTService.Settings(
        eager_eot_threshold=0.5,
        eot_threshold=0.8,
        keyterm=["Pipecat", "Deepgram"],
    ),
)

Updating Settings Mid-Stream

The keyterm, eot_threshold, eager_eot_threshold, and eot_timeout_ms settings can be updated on-the-fly using STTUpdateSettingsFrame:
from pipecat.frames.frames import STTUpdateSettingsFrame
from pipecat.services.deepgram.flux import DeepgramFluxSTTSettings

# During pipeline execution, update settings without reconnecting
await task.queue_frame(
    STTUpdateSettingsFrame(
        delta=DeepgramFluxSTTSettings(
            eot_threshold=0.8,
            keyterm=["Pipecat", "Deepgram"],
        )
    )
)
This sends a Configure message to Deepgram over the existing WebSocket connection, allowing you to adjust turn detection behavior and key terms without interrupting the conversation.

Notes

  • Turn management: Flux provides its own turn detection via StartOfTurn/EndOfTurn events and broadcasts UserStartedSpeakingFrame/UserStoppedSpeakingFrame directly. Use ExternalUserTurnStrategies to avoid conflicting VAD-based turn management.
  • On-the-fly configuration: Supports updating keyterm, eot_threshold, eager_eot_threshold, and eot_timeout_ms mid-stream via STTUpdateSettingsFrame. These updates are sent as Configure messages over the existing WebSocket connection without requiring a reconnect.
  • EagerEndOfTurn: Enabling eager_eot_threshold provides faster response times by predicting end-of-turn before it is confirmed. EagerEndOfTurn transcripts are pushed as InterimTranscriptionFrames. If the user resumes speaking, a TurnResumed event is fired.

Event Handlers

Supports the standard service connection events (on_connected, on_disconnected, on_connection_error), plus turn-level events for more granular conversation tracking:
EventDescription
on_start_of_turnStart of a new turn detected
on_turn_resumedA previously paused turn has resumed
on_end_of_turnEnd of turn detected
on_eager_end_of_turnEarly end-of-turn prediction
on_updateTranscript updated
@stt.event_handler("on_start_of_turn")
async def on_start_of_turn(service, transcript):
    print(f"Turn started: {transcript}")

@stt.event_handler("on_end_of_turn")
async def on_end_of_turn(service, transcript):
    print(f"Turn ended: {transcript}")

@stt.event_handler("on_eager_end_of_turn")
async def on_eager_end_of_turn(service, transcript):
    print(f"Early end-of-turn prediction: {transcript}")
Turn events receive (service, transcript) where transcript is the current transcript text. The on_turn_resumed event receives only (service).

DeepgramSageMakerSTTService

endpoint_name
str
required
Name of the SageMaker endpoint with Deepgram model deployed.
region
str
required
AWS region where the SageMaker endpoint is deployed (e.g., "us-east-2").
encoding
str
default:"linear16"
Audio encoding format.
channels
int
default:"1"
Number of audio channels.
multichannel
bool
default:"False"
Transcribe each audio channel independently.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
mip_opt_out
bool
default:"None"
Opt out of the Deepgram Model Improvement Program.
live_options
LiveOptions
default:"None"
deprecated
Legacy configuration options. Deprecated in v0.0.105. Use settings=DeepgramSageMakerSTTService.Settings(...) instead.
settings
DeepgramSageMakerSTTService.Settings
default:"None"
Runtime-configurable settings for the STT service. See Settings below.
ttfs_p99_latency
float
default:"DEEPGRAM_SAGEMAKER_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment.

Settings

Runtime-configurable settings passed via the settings constructor argument using DeepgramSageMakerSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details. The SageMaker service inherits all settings from DeepgramSTTService.Settings. See DeepgramSTTService Settings above for the full list.

Usage

from pipecat.services.deepgram.sagemaker.stt import DeepgramSageMakerSTTService

stt = DeepgramSageMakerSTTService(
    endpoint_name=os.getenv("SAGEMAKER_STT_ENDPOINT_NAME"),
    region=os.getenv("AWS_REGION"),
    settings=DeepgramSageMakerSTTService.Settings(
        model="nova-3",
        language="en",
        interim_results=True,
        punctuate=True,
    ),
)

Notes

  • Finalize on VAD stop: Like DeepgramSTTService, the SageMaker service sends a finalize request when the pipeline’s VAD detects the user has stopped speaking.
  • SageMaker deployment: Requires a Deepgram model deployed to an AWS SageMaker endpoint. See the Deepgram SageMaker deployment guide for setup instructions.
  • Keepalive: Automatically sends KeepAlive messages every 5 seconds to maintain the connection during periods of silence.

Event Handlers

Supports the standard service connection events (on_connected, on_disconnected, on_connection_error).
The InputParams / params= / live_options= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.