> ## Documentation Index > Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt > Use this file to discover all available pages before exploring further. # Deepgram > Speech-to-text service implementations using Deepgram's real-time transcription and Flux APIs ## Overview Deepgram provides four STT service implementations: * `DeepgramSTTService` for real-time speech recognition using Deepgram's standard WebSocket API with support for interim results, language detection, and voice activity detection (VAD) * `DeepgramFluxSTTService` for advanced conversational AI with Flux capabilities including intelligent turn detection, eager end-of-turn events, multilingual support (with `flux-general-multi` model), and enhanced speech processing for improved response timing * `DeepgramSageMakerSTTService` for real-time speech recognition using Deepgram Nova models deployed on AWS SageMaker endpoints via HTTP/2 bidirectional streaming * `DeepgramFluxSageMakerSTTService` for advanced conversational AI using Deepgram Flux models deployed on AWS SageMaker endpoints with native turn detection and low-latency streaming Pipecat's API methods for standard Deepgram STT Pipecat's API methods for Deepgram Flux STT Complete example with standard Deepgram STT Complete example with Deepgram Flux STT Complete example with Deepgram Nova on SageMaker Complete example with Deepgram Flux on SageMaker Official Deepgram documentation and features Access API keys and transcription models ## Installation To use Deepgram STT services, install the required dependencies: ```bash theme={null} uv add "pipecat-ai[deepgram]" ``` For the SageMaker variant, install both the Deepgram and SageMaker dependencies: ```bash theme={null} uv add "pipecat-ai[deepgram,sagemaker]" ``` ## Prerequisites ### Deepgram Account Setup Before using `DeepgramSTTService` or `DeepgramFluxSTTService`, you need: 1. **Deepgram Account**: Sign up at [Deepgram Console](https://console.deepgram.com/signup) 2. **API Key**: Generate an API key from your console dashboard 3. **Model Selection**: Choose from available transcription models and features ### Required Environment Variables * `DEEPGRAM_API_KEY`: Your Deepgram API key for authentication ### AWS SageMaker Setup Before using `DeepgramSageMakerSTTService` or `DeepgramFluxSageMakerSTTService`, you need: 1. **AWS Account**: With credentials configured (via environment variables, AWS CLI, or instance metadata) 2. **SageMaker Endpoint**: A deployed SageMaker endpoint with a [Deepgram model](https://developers.deepgram.com/docs/deploy-amazon-sagemaker) (Nova for standard service, Flux for advanced turn detection) 3. **Deepgram SDK**: The Deepgram SDK may be needed for certain advanced configurations ## DeepgramSTTService Deepgram API key for authentication. Custom Deepgram API base URL. Leave empty for the default endpoint. Supports `wss://`, `https://`, `ws://`, `http://`, or bare hostname (defaults to secure). Preserves the specified scheme, useful for air-gapped or private deployments that don't use TLS. Audio encoding format. Number of audio channels. Transcribe each audio channel independently. Audio sample rate in Hz. When `None`, uses the pipeline's configured sample rate. Callback URL for async transcription delivery. HTTP method for the callback (`"GET"` or `"POST"`). Custom billing tag. Opt out of the Deepgram Model Improvement Program. Legacy configuration options. *Deprecated in v0.0.105. Use `settings=DeepgramSTTService.Settings(...)` for runtime-updatable fields and direct constructor parameters for connection-level config instead.* Runtime-configurable settings for the STT service. See [Settings](#settings) below. Additional Deepgram features to enable. P99 latency from speech end to final transcript in seconds. Override for your deployment. ### Settings Runtime-configurable settings passed via the `settings` constructor argument using `DeepgramSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details. | Parameter | Type | Default | Description | | ------------------ | ----------------- | ------------------ | ------------------------------------------------------------ | | `model` | `str` | `"nova-3-general"` | Deepgram model to use. *(Inherited from base STT settings.)* | | `language` | `Language \| str` | `Language.EN` | Recognition language. *(Inherited from base STT settings.)* | | `detect_entities` | `bool` | `False` | Enable named entity detection. | | `diarize` | `bool` | `False` | Enable speaker diarization. | | `dictation` | `bool` | `False` | Enable dictation mode (converts commands to punctuation). | | `endpointing` | `int \| bool` | `None` | Endpointing sensitivity in ms, or `False` to disable. | | `interim_results` | `bool` | `True` | Stream partial recognition results. | | `keyterm` | `str \| list` | `None` | Keyterms to boost recognition accuracy. | | `keywords` | `str \| list` | `None` | Keywords to boost (str or list of str). | | `numerals` | `bool` | `False` | Convert spoken numbers to numerals. | | `profanity_filter` | `bool` | `True` | Filter profanity from transcripts. | | `punctuate` | `bool` | `True` | Add punctuation to transcripts. | | `redact` | `str \| list` | `None` | Redact sensitive information. | | `replace` | `str \| list` | `None` | Word replacement rules. | | `search` | `str \| list` | `None` | Search terms to highlight. | | `smart_format` | `bool` | `False` | Apply smart formatting to transcripts. | | `utterance_end_ms` | `int` | `None` | Silence duration in ms before an utterance-end event. | ### Usage ```python theme={null} from pipecat.services.deepgram.stt import DeepgramSTTService stt = DeepgramSTTService( api_key=os.getenv("DEEPGRAM_API_KEY"), ) ``` #### With Custom Settings ```python theme={null} from pipecat.services.deepgram.stt import DeepgramSTTService stt = DeepgramSTTService( api_key=os.getenv("DEEPGRAM_API_KEY"), settings=DeepgramSTTService.Settings( model="nova-3-general", language="es", punctuate=True, smart_format=True, ), ) ``` ### Notes * **Finalize on VAD stop**: When the pipeline's VAD detects the user has stopped speaking, the service sends a [finalize](https://developers.deepgram.com/docs/finalize) request to Deepgram for faster final transcript delivery. * **Multilingual support**: Deepgram Nova models support many languages. The default is `Language.EN` (English). Set `language="multi"` in settings to enable multilingual transcription, which will detect and transcribe multiple languages within the same audio stream. * **Runtime settings updates**: Changing settings via `STTUpdateSettingsFrame` triggers a reconnection with the new parameters. To avoid audio loss, reconnection is deferred until the current user turn ends (i.e., until `UserStoppedSpeakingFrame` is received). Audio frames arriving during the reconnect are buffered and replayed once the new connection is ready. ### Event Handlers Supports the standard [service connection events](/api-reference/server/events/service-events) (`on_connected`, `on_disconnected`, `on_connection_error`). ## DeepgramFluxSTTService Deepgram Flux provides its own user turn start and end detection and automatically requests `ExternalUserTurnStrategies` at start, so you don't need to configure turn strategies manually. Pass your own `user_turn_strategies` only to override the service's recommendation. See [User Turn Strategies](/api-reference/server/utilities/turn-management/user-turn-strategies) for more details. Deepgram API key for authentication. WebSocket URL for the Deepgram Flux API. Audio sample rate in Hz. When `None`, uses the pipeline's configured sample rate. Deepgram Flux model to use for transcription. *Deprecated in v0.0.105. Use `settings=DeepgramFluxSTTService.Settings(...)` instead.* Opt out of the Deepgram Model Improvement Program. Audio encoding format required by the Flux API. Must be `"linear16"`. Tags to label requests for identification during usage reporting. Legacy configuration options. *Deprecated in v0.0.105. Use `settings=DeepgramFluxSTTService.Settings(...)` instead.* Configuration settings for the Flux API. See [Settings](#settings-2) below. Whether the bot should be interrupted when Flux detects user speech. Minimum silence duration in seconds before the watchdog sends silence to prevent dangling turns. The actual threshold is `max(chunk_duration * 2, watchdog_min_timeout)`, adapting automatically to the audio chunk size in use. ### Settings Runtime-configurable settings passed via the `settings` constructor argument using `DeepgramFluxSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details. | Parameter | Type | Default | Description | On-the-fly | | --------------------- | ----------------- | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | | `model` | `str` | `"flux-general-en"` | Deepgram Flux model to use. *(Inherited from base STT settings.)* | | | `language` | `Language \| str` | `None` | Recognition language. *(Inherited from base STT settings.)* | | | `eager_eot_threshold` | `float` | `None` | EagerEndOfTurn threshold. Lower values trigger faster responses with more LLM calls; higher values are more conservative. `None` disables EagerEndOfTurn. | ✓ | | `eot_threshold` | `float` | `None` | End-of-turn confidence threshold (default 0.7). Lower = faster turn endings. | ✓ | | `eot_timeout_ms` | `int` | `None` | Time in ms after speech to finish a turn regardless of confidence (default 5000). | ✓ | | `keyterm` | `list` | `[]` | Key terms to boost recognition accuracy for specialized terminology. | ✓ | | `min_confidence` | `float` | `None` | Minimum average confidence required to produce a `TranscriptionFrame`. | | | `language_hints` | `list[Language]` | `None` | Languages to bias transcription toward. Only honored by `flux-general-multi`. Empty list clears hints; `None` means auto-detect. | ✓ | Parameters marked with ✓ in the "On-the-fly" column can be updated mid-stream using `STTUpdateSettingsFrame` without requiring a WebSocket reconnect. ### Usage ```python theme={null} from pipecat.services.deepgram.flux.stt import DeepgramFluxSTTService stt = DeepgramFluxSTTService( api_key=os.getenv("DEEPGRAM_API_KEY"), ) ``` #### With EagerEndOfTurn ```python theme={null} from pipecat.services.deepgram.flux.stt import DeepgramFluxSTTService stt = DeepgramFluxSTTService( api_key=os.getenv("DEEPGRAM_API_KEY"), settings=DeepgramFluxSTTService.Settings( eager_eot_threshold=0.5, eot_threshold=0.8, keyterm=["Pipecat", "Deepgram"], ), ) ``` #### Multilingual Support ```python theme={null} from pipecat.services.deepgram.flux.stt import DeepgramFluxSTTService from pipecat.transcriptions.language import Language # Use flux-general-multi with language hints stt = DeepgramFluxSTTService( api_key=os.getenv("DEEPGRAM_API_KEY"), settings=DeepgramFluxSTTService.Settings( model="flux-general-multi", language_hints=[Language.EN, Language.ES, Language.FR], ), ) ``` #### Updating Settings Mid-Stream The `keyterm`, `eot_threshold`, `eager_eot_threshold`, `eot_timeout_ms`, and `language_hints` settings can be updated on-the-fly using `STTUpdateSettingsFrame`: ```python theme={null} from pipecat.frames.frames import STTUpdateSettingsFrame from pipecat.services.deepgram.flux.stt import DeepgramFluxSTTService from pipecat.transcriptions.language import Language # During pipeline execution, update settings without reconnecting await worker.queue_frame( STTUpdateSettingsFrame( delta=DeepgramFluxSTTService.Settings( eot_threshold=0.8, keyterm=["Pipecat", "Deepgram"], ) ) ) # Detect-then-lock: narrow language hints mid-stream await worker.queue_frame( STTUpdateSettingsFrame( delta=DeepgramFluxSTTService.Settings( language_hints=[Language.ES], ) ) ) ``` This sends a `Configure` message to Deepgram over the existing WebSocket connection, allowing you to adjust turn detection behavior, key terms, and language hints without interrupting the conversation. ### Notes * **Turn management**: Flux provides its own turn detection via `StartOfTurn`/`EndOfTurn` events and broadcasts `UserStartedSpeakingFrame`/`UserStoppedSpeakingFrame` directly. The service automatically requests `ExternalUserTurnStrategies` at start to avoid conflicting VAD-based turn management. * **VAD is optional**: When Flux drives turn detection, a VAD (such as `SileroVADAnalyzer`) in your transport is not required for core functionality. Include one if you want useful STT metrics; omit it otherwise. * **On-the-fly configuration**: Supports updating `keyterm`, `eot_threshold`, `eager_eot_threshold`, `eot_timeout_ms`, and `language_hints` mid-stream via `STTUpdateSettingsFrame`. These updates are sent as `Configure` messages over the existing WebSocket connection without requiring a reconnect. * **EagerEndOfTurn**: Enabling `eager_eot_threshold` provides faster response times by predicting end-of-turn before it is confirmed. EagerEndOfTurn transcripts are pushed as `InterimTranscriptionFrame`s. If the user resumes speaking, a `TurnResumed` event is fired. * **Multilingual support**: Use the `flux-general-multi` model with `language_hints` to bias transcription toward specific languages (EN, ES, FR, DE, HI, RU, PT, JA, IT, NL). `TranscriptionFrame.language` reflects the detected language for each turn. Omit hints for auto-detection or pass a subset to bias toward expected languages. ### Event Handlers Supports the standard [service connection events](/api-reference/server/events/service-events) (`on_connected`, `on_disconnected`, `on_connection_error`), plus turn-level events for more granular conversation tracking: | Event | Description | | ---------------------- | ------------------------------------ | | `on_start_of_turn` | Start of a new turn detected | | `on_turn_resumed` | A previously paused turn has resumed | | `on_end_of_turn` | End of turn detected | | `on_eager_end_of_turn` | Early end-of-turn prediction | | `on_update` | Transcript updated | ```python theme={null} @stt.event_handler("on_start_of_turn") async def on_start_of_turn(service, transcript): print(f"Turn started: {transcript}") @stt.event_handler("on_end_of_turn") async def on_end_of_turn(service, transcript): print(f"Turn ended: {transcript}") @stt.event_handler("on_eager_end_of_turn") async def on_eager_end_of_turn(service, transcript): print(f"Early end-of-turn prediction: {transcript}") ``` Turn events receive `(service, transcript)` where `transcript` is the current transcript text. The `on_turn_resumed` event receives only `(service)`. ## DeepgramSageMakerSTTService Name of the SageMaker endpoint with Deepgram model deployed. AWS region where the SageMaker endpoint is deployed (e.g., `"us-east-2"`). Audio encoding format. Number of audio channels. Transcribe each audio channel independently. Audio sample rate in Hz. When `None`, uses the pipeline's configured sample rate. Opt out of the Deepgram Model Improvement Program. Legacy configuration options. *Deprecated in v0.0.105. Use `settings=DeepgramSageMakerSTTService.Settings(...)` instead.* Runtime-configurable settings for the STT service. See [Settings](#settings-3) below. P99 latency from speech end to final transcript in seconds. Override for your deployment. ### Settings Runtime-configurable settings passed via the `settings` constructor argument using `DeepgramSageMakerSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details. The SageMaker service inherits all settings from `DeepgramSTTService.Settings`. See [DeepgramSTTService Settings](#settings) above for the full list. ### Usage ```python theme={null} from pipecat.services.deepgram.sagemaker.stt import DeepgramSageMakerSTTService stt = DeepgramSageMakerSTTService( endpoint_name=os.getenv("SAGEMAKER_STT_ENDPOINT_NAME"), region=os.getenv("AWS_REGION"), settings=DeepgramSageMakerSTTService.Settings( model="nova-3", language="en", interim_results=True, punctuate=True, ), ) ``` ### Notes * **Finalize on VAD stop**: Like `DeepgramSTTService`, the SageMaker service sends a [finalize](https://developers.deepgram.com/docs/finalize) request when the pipeline's VAD detects the user has stopped speaking. * **SageMaker deployment**: Requires a Deepgram model deployed to an AWS SageMaker endpoint. See the [Deepgram SageMaker deployment guide](https://developers.deepgram.com/docs/deploy-amazon-sagemaker) for setup instructions. * **Keepalive**: Automatically sends KeepAlive messages every 5 seconds to maintain the connection during periods of silence. ### Event Handlers Supports the standard [service connection events](/api-reference/server/events/service-events) (`on_connected`, `on_disconnected`, `on_connection_error`). ## DeepgramFluxSageMakerSTTService Deepgram Flux provides its own user turn start and end detection and automatically requests `ExternalUserTurnStrategies` at start, so you don't need to configure turn strategies manually. Pass your own `user_turn_strategies` only to override the service's recommendation. See [User Turn Strategies](/api-reference/server/utilities/turn-management/user-turn-strategies) for more details. Name of the SageMaker endpoint with Deepgram Flux model deployed (e.g., `"my-deepgram-flux-endpoint"`). AWS region where the endpoint is deployed (e.g., `"us-east-2"`). Audio encoding format. Must be `"linear16"`. Audio sample rate in Hz. When `None`, uses the pipeline's configured sample rate. Opt out of the Deepgram Model Improvement Program. Tags to label requests for identification during usage reporting. Whether to interrupt the bot when Flux detects user speech. Minimum silence duration in seconds before the watchdog sends silence to prevent dangling turns. The actual threshold is `max(chunk_duration * 2, watchdog_min_timeout)`, adapting automatically to the audio chunk size in use. Runtime-configurable settings. See [Settings](#settings-4) below. ### Settings Runtime-configurable settings passed via the `settings` constructor argument using `DeepgramFluxSageMakerSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details. The Flux SageMaker service inherits all settings from `DeepgramFluxSTTService.Settings` with the same on-the-fly configuration support: | Parameter | Type | Default | Description | On-the-fly | | --------------------- | ----------------- | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | | `model` | `str` | `"flux-general-en"` | Deepgram Flux model to use. *(Inherited from base STT settings.)* | | | `language` | `Language \| str` | `None` | Recognition language. *(Inherited from base STT settings.)* | | | `eager_eot_threshold` | `float` | `None` | EagerEndOfTurn threshold. Lower values trigger faster responses with more LLM calls; higher values are more conservative. `None` disables EagerEndOfTurn. | ✓ | | `eot_threshold` | `float` | `None` | End-of-turn confidence threshold (default 0.7). Lower = faster turn endings. | ✓ | | `eot_timeout_ms` | `int` | `None` | Time in ms after speech to finish a turn regardless of confidence (default 5000). | ✓ | | `keyterm` | `list` | `[]` | Key terms to boost recognition accuracy for specialized terminology. | ✓ | | `min_confidence` | `float` | `None` | Minimum average confidence required to produce a `TranscriptionFrame`. | | | `language_hints` | `list[Language]` | `None` | Languages to bias transcription toward. Only honored by `flux-general-multi`. Empty list clears hints; `None` means auto-detect. | ✓ | Parameters marked with ✓ in the "On-the-fly" column can be updated mid-stream using `STTUpdateSettingsFrame` without requiring a session restart. These updates are sent as `Configure` messages to the Flux model over the existing connection. ### Usage ```python theme={null} from pipecat.services.deepgram.flux.sagemaker.stt import DeepgramFluxSageMakerSTTService stt = DeepgramFluxSageMakerSTTService( endpoint_name=os.getenv("SAGEMAKER_FLUX_ENDPOINT_NAME"), region=os.getenv("AWS_REGION"), ) ``` #### With Custom Settings ```python theme={null} from pipecat.services.deepgram.flux.sagemaker.stt import DeepgramFluxSageMakerSTTService stt = DeepgramFluxSageMakerSTTService( endpoint_name=os.getenv("SAGEMAKER_FLUX_ENDPOINT_NAME"), region=os.getenv("AWS_REGION"), settings=DeepgramFluxSageMakerSTTService.Settings( model="flux-general-en", eot_threshold=0.7, eager_eot_threshold=0.5, keyterm=["Pipecat", "AI"], ), ) ``` #### Updating Settings Mid-Stream The `keyterm`, `eot_threshold`, `eager_eot_threshold`, `eot_timeout_ms`, and `language_hints` settings can be updated on-the-fly: ```python theme={null} from pipecat.frames.frames import STTUpdateSettingsFrame from pipecat.services.deepgram.flux.sagemaker.stt import DeepgramFluxSageMakerSTTService from pipecat.transcriptions.language import Language # Update settings without reconnecting await worker.queue_frame( STTUpdateSettingsFrame( delta=DeepgramFluxSageMakerSTTService.Settings( eot_threshold=0.8, keyterm=["Pipecat", "Deepgram", "SageMaker"], language_hints=[Language.EN], ) ) ) ``` ### Notes * **Turn management**: Flux provides native turn detection via `StartOfTurn`/`EndOfTurn` events and broadcasts `UserStartedSpeakingFrame`/`UserStoppedSpeakingFrame` directly. The service automatically requests `ExternalUserTurnStrategies` at start to avoid conflicting VAD-based turn management. * **VAD is optional**: When Flux drives turn detection, a VAD (such as `SileroVADAnalyzer`) in your transport is not required for core functionality. Include one if you want useful STT metrics; omit it otherwise. * **On-the-fly configuration**: Supports updating `keyterm`, `eot_threshold`, `eager_eot_threshold`, `eot_timeout_ms`, and `language_hints` mid-stream via `STTUpdateSettingsFrame`. These updates are sent as `Configure` messages over the existing HTTP/2 connection without requiring a reconnect. * **EagerEndOfTurn**: Enabling `eager_eot_threshold` provides faster response times by predicting end-of-turn before it is confirmed. EagerEndOfTurn transcripts are pushed as `InterimTranscriptionFrame`s. If the user resumes speaking, a `TurnResumed` event is fired. * **Multilingual support**: Use the `flux-general-multi` model with `language_hints` to bias transcription toward specific languages (EN, ES, FR, DE, HI, RU, PT, JA, IT, NL). `TranscriptionFrame.language` reflects the detected language for each turn. Omit hints for auto-detection or pass a subset to bias toward expected languages. * **SageMaker deployment**: Requires a Deepgram Flux model deployed to an AWS SageMaker endpoint. Unlike Nova models, Flux provides native turn detection and does not require external VAD. * **No KeepAlive needed**: The Flux protocol uses a dynamic watchdog mechanism that sends silence when needed to maintain the connection (threshold: `max(chunk_duration * 2, watchdog_min_timeout)`), so manual KeepAlive messages are not required. ### Event Handlers Supports the standard [service connection events](/api-reference/server/events/service-events) (`on_connected`, `on_disconnected`, `on_connection_error`), plus turn-level events for granular conversation tracking: | Event | Description | | ---------------------- | ------------------------------------ | | `on_start_of_turn` | Start of a new turn detected | | `on_turn_resumed` | A previously paused turn has resumed | | `on_end_of_turn` | End of turn detected | | `on_eager_end_of_turn` | Early end-of-turn prediction | | `on_update` | Transcript updated | ```python theme={null} @stt.event_handler("on_start_of_turn") async def on_start_of_turn(service, transcript): print(f"Turn started: {transcript}") @stt.event_handler("on_end_of_turn") async def on_end_of_turn(service, transcript): print(f"Turn ended: {transcript}") @stt.event_handler("on_eager_end_of_turn") async def on_eager_end_of_turn(service, transcript): print(f"Early end-of-turn prediction: {transcript}") ``` Turn events receive `(service, transcript)` where `transcript` is the current transcript text. The `on_turn_resumed` event receives only `(service)`. The `InputParams` / `params=` / `live_options=` pattern is deprecated as of v0.0.105. Use `Settings` / `settings=` instead. See the [Service Settings guide](/pipecat/fundamentals/service-settings) for migration details.