Overview
Deepgram provides two STT service implementations:DeepgramSTTServicefor real-time speech recognition using Deepgram’s standard WebSocket API with support for interim results, language detection, and voice activity detection (VAD)DeepgramFluxSTTServicefor advanced conversational AI with Flux capabilities including intelligent turn detection, eager end-of-turn events, and enhanced speech processing for improved response timing.
Since Deepgram Flux provides its own user turn start and end detection, you
should use
ExternalUserTurnStrategies to let Flux handle turn management.
See User Turn
Strategies for
configuration details.Deepgram STT API Reference
Pipecat’s API methods for standard Deepgram STT
Deepgram Flux API Reference
Pipecat’s API methods for Deepgram Flux STT
Standard STT Example
Complete example with standard Deepgram STT
Flux STT Example
Complete example with Deepgram Flux STT
Deepgram Documentation
Official Deepgram documentation and features
Deepgram Console
Access API keys and transcription models
Installation
To use Deepgram services, install the required dependencies:Prerequisites
Deepgram Account Setup
Before using Deepgram STT services, you need:- Deepgram Account: Sign up at Deepgram Console
- API Key: Generate an API key from your console dashboard
- Model Selection: Choose from available transcription models and features
Required Environment Variables
DEEPGRAM_API_KEY: Your Deepgram API key for authentication
Configuration
DeepgramSTTService
Deepgram API key for authentication.
Custom Deepgram API base URL. Leave empty for the default endpoint.
Audio sample rate in Hz. When
None, uses the value from live_options or the pipeline’s configured sample rate.Deepgram
LiveOptions for detailed configuration. When provided, these settings are merged with the defaults. See Deepgram LiveOptions for available options.Additional Deepgram features to enable.
P99 latency from speech end to final transcript in seconds. Override for your deployment.
LiveOptions are:
| Option | Default | Description |
|---|---|---|
encoding | "linear16" | Audio encoding format. |
language | Language.EN | Recognition language. |
model | "nova-3-general" | Deepgram model to use. |
channels | 1 | Number of audio channels. |
interim_results | True | Stream partial recognition results. |
smart_format | False | Apply smart formatting. |
punctuate | True | Add punctuation to transcripts. |
profanity_filter | True | Filter profanity from transcripts. |
vad_events | False | Enable Deepgram’s built-in VAD events (deprecated). |
DeepgramFluxSTTService
Deepgram API key for authentication.
WebSocket URL for the Deepgram Flux API.
Audio sample rate in Hz. When
None, uses the pipeline’s configured sample rate.Deepgram Flux model to use for transcription.
Audio encoding format required by the Flux API. Must be
"linear16".Configuration parameters for the Flux API. See Flux InputParams below.
Whether the bot should be interrupted when Flux detects user speech.
Flux InputParams
Parameters passed via theparams constructor argument for DeepgramFluxSTTService.
| Parameter | Type | Default | Description |
|---|---|---|---|
eager_eot_threshold | float | None | EagerEndOfTurn threshold. Lower values trigger faster responses with more LLM calls; higher values are more conservative. None disables EagerEndOfTurn. |
eot_threshold | float | None | End-of-turn confidence threshold (default 0.7). Lower = faster turn endings. |
eot_timeout_ms | int | None | Time in ms after speech to finish a turn regardless of confidence (default 5000). |
keyterm | list | [] | Key terms to boost recognition accuracy for specialized terminology. |
mip_opt_out | bool | None | Opt out of Deepgram’s Model Improvement Program. |
tag | list | [] | Tags for request identification during usage reporting. |
min_confidence | float | None | Minimum average confidence required to produce a TranscriptionFrame. |
Usage
Basic DeepgramSTTService
With Custom LiveOptions
DeepgramFluxSTTService
Flux with EagerEndOfTurn
Notes
- Finalize on VAD stop: When the pipeline’s VAD detects the user has stopped speaking,
DeepgramSTTServicesends a finalize request to Deepgram for faster final transcript delivery. - Flux turn management:
DeepgramFluxSTTServiceprovides its own turn detection viaStartOfTurn/EndOfTurnevents and broadcastsUserStartedSpeakingFrame/UserStoppedSpeakingFramedirectly. UseExternalUserTurnStrategiesto avoid conflicting VAD-based turn management. - EagerEndOfTurn: In Flux, enabling
eager_eot_thresholdprovides faster response times by predicting end-of-turn before it is confirmed. EagerEndOfTurn transcripts are pushed asInterimTranscriptionFrames. If the user resumes speaking, aTurnResumedevent is fired. - Deprecated vad_events: The
vad_eventsoption in standardDeepgramSTTServiceis deprecated. Use Silero VAD instead.
Event Handlers
In addition to the standard service connection events (on_connected, on_disconnected, on_connection_error), Deepgram STT provides:
DeepgramSTTService
| Event | Description |
|---|---|
on_speech_started | Speech detected in the audio stream |
on_utterance_end | End of utterance detected by Deepgram |
DeepgramFluxSTTService
Deepgram Flux provides turn-level events for more granular conversation tracking:| Event | Description |
|---|---|
on_start_of_turn | Start of a new turn detected |
on_turn_resumed | A previously paused turn has resumed |
on_end_of_turn | End of turn detected |
on_eager_end_of_turn | Early end-of-turn prediction |
on_update | Transcript updated |
(service, transcript) where transcript is the current transcript text. The on_turn_resumed event receives only (service).