Overview
Deepgram provides three STT service implementations:DeepgramSTTServicefor real-time speech recognition using Deepgram’s standard WebSocket API with support for interim results, language detection, and voice activity detection (VAD)DeepgramFluxSTTServicefor advanced conversational AI with Flux capabilities including intelligent turn detection, eager end-of-turn events, and enhanced speech processing for improved response timingDeepgramSageMakerSTTServicefor real-time speech recognition using Deepgram models deployed on AWS SageMaker endpoints via HTTP/2 bidirectional streaming
Deepgram STT API Reference
Pipecat’s API methods for standard Deepgram STT
Deepgram Flux API Reference
Pipecat’s API methods for Deepgram Flux STT
Standard STT Example
Complete example with standard Deepgram STT
Flux STT Example
Complete example with Deepgram Flux STT
SageMaker Example
Complete example with Deepgram on SageMaker
Deepgram Documentation
Official Deepgram documentation and features
Deepgram Console
Access API keys and transcription models
Installation
To use Deepgram STT services, install the required dependencies:Prerequisites
Deepgram Account Setup
Before usingDeepgramSTTService or DeepgramFluxSTTService, you need:
- Deepgram Account: Sign up at Deepgram Console
- API Key: Generate an API key from your console dashboard
- Model Selection: Choose from available transcription models and features
Required Environment Variables
DEEPGRAM_API_KEY: Your Deepgram API key for authentication
AWS SageMaker Setup
Before usingDeepgramSageMakerSTTService, you need:
- AWS Account: With credentials configured (via environment variables, AWS CLI, or instance metadata)
- SageMaker Endpoint: A deployed SageMaker endpoint with a Deepgram model
- Deepgram SDK: The Deepgram SDK may be needed for certain advanced configurations
DeepgramSTTService
Deepgram API key for authentication.
Custom Deepgram API base URL. Leave empty for the default endpoint.
Audio encoding format.
Number of audio channels.
Transcribe each audio channel independently.
Audio sample rate in Hz. When
None, uses the pipeline’s configured sample
rate.Callback URL for async transcription delivery.
HTTP method for the callback (
"GET" or "POST").Custom billing tag.
Opt out of the Deepgram Model Improvement Program.
Legacy configuration options. Deprecated in v0.0.105. Use
settings=DeepgramSTTService.Settings(...) for runtime-updatable fields and
direct constructor parameters for connection-level config instead.Runtime-configurable settings for the STT service. See Settings
below.
Additional Deepgram features to enable.
Whether to interrupt the bot when Deepgram VAD detects user speech.
Deprecated in v0.0.99. Will be removed along with
vad_events support.P99 latency from speech end to final transcript in seconds. Override for your
deployment.
Settings
Runtime-configurable settings passed via thesettings constructor argument using DeepgramSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | "nova-3-general" | Deepgram model to use. (Inherited from base STT settings.) |
language | Language | str | Language.EN | Recognition language. (Inherited from base STT settings.) |
detect_entities | bool | False | Enable named entity detection. |
diarize | bool | False | Enable speaker diarization. |
dictation | bool | False | Enable dictation mode (converts commands to punctuation). |
endpointing | int | bool | None | Endpointing sensitivity in ms, or False to disable. |
interim_results | bool | True | Stream partial recognition results. |
keyterm | str | list | None | Keyterms to boost recognition accuracy. |
keywords | str | list | None | Keywords to boost (str or list of str). |
numerals | bool | False | Convert spoken numbers to numerals. |
profanity_filter | bool | True | Filter profanity from transcripts. |
punctuate | bool | True | Add punctuation to transcripts. |
redact | str | list | None | Redact sensitive information. |
replace | str | list | None | Word replacement rules. |
search | str | list | None | Search terms to highlight. |
smart_format | bool | False | Apply smart formatting to transcripts. |
utterance_end_ms | int | None | Silence duration in ms before an utterance-end event. |
vad_events | bool | False | Enable Deepgram’s built-in VAD events (deprecated). |
Usage
With Custom Settings
Notes
- Finalize on VAD stop: When the pipeline’s VAD detects the user has stopped speaking, the service sends a finalize request to Deepgram for faster final transcript delivery.
- Deprecated vad_events: The
vad_eventssetting is deprecated. Use Silero VAD instead.
Event Handlers
Supports the standard service connection events (on_connected, on_disconnected, on_connection_error), plus:
| Event | Description |
|---|---|
on_speech_started | Speech detected in the audio stream |
on_utterance_end | End of utterance detected by Deepgram |
DeepgramFluxSTTService
Since Deepgram Flux provides its own user turn start and end detection, you
should use
ExternalUserTurnStrategies to let Flux handle turn management.
See User Turn
Strategies for
configuration details.Deepgram API key for authentication.
WebSocket URL for the Deepgram Flux API.
Audio sample rate in Hz. When
None, uses the pipeline’s configured sample
rate.Deepgram Flux model to use for transcription. Deprecated in v0.0.105. Use
settings=DeepgramFluxSTTService.Settings(...) instead.Opt out of the Deepgram Model Improvement Program.
Audio encoding format required by the Flux API. Must be
"linear16".Tags to label requests for identification during usage reporting.
Legacy configuration options. Deprecated in v0.0.105. Use
settings=DeepgramFluxSTTService.Settings(...) instead.Configuration settings for the Flux API. See Settings below.
Whether the bot should be interrupted when Flux detects user speech.
Settings
Runtime-configurable settings passed via thesettings constructor argument using DeepgramFluxSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description | On-the-fly |
|---|---|---|---|---|
model | str | "flux-general-en" | Deepgram Flux model to use. (Inherited from base STT settings.) | |
language | Language | str | None | Recognition language. (Inherited from base STT settings.) | |
eager_eot_threshold | float | None | EagerEndOfTurn threshold. Lower values trigger faster responses with more LLM calls; higher values are more conservative. None disables EagerEndOfTurn. | ✓ |
eot_threshold | float | None | End-of-turn confidence threshold (default 0.7). Lower = faster turn endings. | ✓ |
eot_timeout_ms | int | None | Time in ms after speech to finish a turn regardless of confidence (default 5000). | ✓ |
keyterm | list | [] | Key terms to boost recognition accuracy for specialized terminology. | ✓ |
min_confidence | float | None | Minimum average confidence required to produce a TranscriptionFrame. |
Parameters marked with ✓ in the “On-the-fly” column can be updated mid-stream
using
STTUpdateSettingsFrame without requiring a WebSocket reconnect.Usage
With EagerEndOfTurn
Updating Settings Mid-Stream
Thekeyterm, eot_threshold, eager_eot_threshold, and eot_timeout_ms settings can be updated on-the-fly using STTUpdateSettingsFrame:
Configure message to Deepgram over the existing WebSocket connection, allowing you to adjust turn detection behavior and key terms without interrupting the conversation.
Notes
- Turn management: Flux provides its own turn detection via
StartOfTurn/EndOfTurnevents and broadcastsUserStartedSpeakingFrame/UserStoppedSpeakingFramedirectly. UseExternalUserTurnStrategiesto avoid conflicting VAD-based turn management. - On-the-fly configuration: Supports updating
keyterm,eot_threshold,eager_eot_threshold, andeot_timeout_msmid-stream viaSTTUpdateSettingsFrame. These updates are sent asConfiguremessages over the existing WebSocket connection without requiring a reconnect. - EagerEndOfTurn: Enabling
eager_eot_thresholdprovides faster response times by predicting end-of-turn before it is confirmed. EagerEndOfTurn transcripts are pushed asInterimTranscriptionFrames. If the user resumes speaking, aTurnResumedevent is fired.
Event Handlers
Supports the standard service connection events (on_connected, on_disconnected, on_connection_error), plus turn-level events for more granular conversation tracking:
| Event | Description |
|---|---|
on_start_of_turn | Start of a new turn detected |
on_turn_resumed | A previously paused turn has resumed |
on_end_of_turn | End of turn detected |
on_eager_end_of_turn | Early end-of-turn prediction |
on_update | Transcript updated |
(service, transcript) where transcript is the current transcript text. The on_turn_resumed event receives only (service).
DeepgramSageMakerSTTService
Name of the SageMaker endpoint with Deepgram model deployed.
AWS region where the SageMaker endpoint is deployed (e.g.,
"us-east-2").Audio encoding format.
Number of audio channels.
Transcribe each audio channel independently.
Audio sample rate in Hz. When
None, uses the pipeline’s configured sample
rate.Opt out of the Deepgram Model Improvement Program.
Legacy configuration options. Deprecated in v0.0.105. Use
settings=DeepgramSageMakerSTTService.Settings(...) instead.Runtime-configurable settings for the STT service. See Settings
below.
P99 latency from speech end to final transcript in seconds. Override for your
deployment.
Settings
Runtime-configurable settings passed via thesettings constructor argument using DeepgramSageMakerSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
The SageMaker service inherits all settings from DeepgramSTTService.Settings. See DeepgramSTTService Settings above for the full list.
Usage
Notes
- Finalize on VAD stop: Like
DeepgramSTTService, the SageMaker service sends a finalize request when the pipeline’s VAD detects the user has stopped speaking. - SageMaker deployment: Requires a Deepgram model deployed to an AWS SageMaker endpoint. See the Deepgram SageMaker deployment guide for setup instructions.
- Keepalive: Automatically sends KeepAlive messages every 5 seconds to maintain the connection during periods of silence.
Event Handlers
Supports the standard service connection events (on_connected, on_disconnected, on_connection_error).