Deepgram

Overview

Deepgram’s Aura API provides high-quality text-to-speech synthesis with streaming capabilities and ultra-low latency. The service offers various voice models optimized for conversational AI applications with efficient audio streaming.

API Reference

Complete API documentation and method details

Deepgram TTS Docs

Official Deepgram text-to-speech API documentation

Example Code

Working example with Silero VAD

Installation

To use Deepgram services, install the required dependencies:

pip install "pipecat-ai[deepgram]"

You’ll also need to set up your Deepgram API key as an environment variable: DEEPGRAM_API_KEY.

Get your API key from the Deepgram Console.

Frames

Input

TextFrame - Text content to synthesize into speech
TTSSpeakFrame - Text that should be spoken immediately
TTSUpdateSettingsFrame - Runtime configuration updates
LLMFullResponseStartFrame / LLMFullResponseEndFrame - LLM response boundaries

Output

TTSStartedFrame - Signals start of synthesis
TTSAudioRawFrame - Generated audio data chunks (streaming)
TTSStoppedFrame - Signals completion of synthesis
ErrorFrame - API or processing errors

Voice Models

Deepgram offers various Aura voice models optimized for different use cases. Here are some highlights:

Voice Model	Description	Language
`aura-2-helena-en`	Natural female voice	English
`aura-2-andromeda-en`	Expressive female voice	English
`aura-helios-en`	Warm male voice	English
`aura-luna-en`	Conversational female voice	English
`aura-stella-en`	Professional female voice	English
`aura-zeus-en`	Authoritative male voice	English

Deepgram regularly adds new voice models. Check the official documentation for the latest available voices.

Supported Sample Rates

8000 Hz - Phone quality
16000 Hz - Standard quality
24000 Hz - High quality (default)
44100 Hz - CD quality
48000 Hz - Professional quality

Integration with VAD

Deepgram TTS works seamlessly with Voice Activity Detection:

Using Silero VAD (Recommended)

from pipecat.audio.vad.silero import SileroVADAnalyzer

transport_params = DailyParams(
    audio_in_enabled=True,
    audio_out_enabled=True,
    vad_analyzer=SileroVADAnalyzer()
)

Using Deepgram’s Built-in VAD Events

from deepgram import LiveOptions

stt = DeepgramSTTService(
    api_key=os.getenv("DEEPGRAM_API_KEY"),
    live_options=LiveOptions(
        vad_events=True,
        utterance_end_ms="1000"
    )
)

@stt.event_handler("on_speech_started")
async def on_speech_started(stt, *args, **kwargs):
    await task.queue_frames([BotInterruptionFrame()])

@stt.event_handler("on_utterance_end")
async def on_utterance_end(stt, *args, **kwargs):
    await task.queue_frames([StopInterruptionFrame()])

Usage Example

Basic Configuration

Initialize the DeepgramTTSService with your API key and use it in your pipeline:

from pipecat.services.deepgram.tts import DeepgramTTSService
import os

# Configure service
tts = DeepgramTTSService(
    api_key=os.getenv("DEEPGRAM_API_KEY"),
    voice="aura-2-andromeda-en",
    sample_rate=24000,
    encoding="linear16"
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

Custom Configuration

# Advanced configuration with custom settings
tts = DeepgramTTSService(
    api_key=os.getenv("DEEPGRAM_API_KEY"),
    voice="aura-helios-en",        # Male voice
    base_url="https://api.deepgram.com",  # Custom endpoint
    sample_rate=48000,             # High-quality audio
    encoding="linear16"
)

Dynamic Configuration

Make settings updates by pushing a TTSUpdateSettingsFrame for the DeepgramTTSService:

from pipecat.frames.frames import TTSUpdateSettingsFrame

await task.queue_frame(
    TTSUpdateSettingsFrame(settings={"voice": "your-new-voice-id"})
)

Metrics

The service provides comprehensive metrics:

Time to First Byte (TTFB) - Latency from text input to first audio
Processing Duration - Total synthesis time
Character Usage - Text processed for billing

Learn how to enable Metrics in your Pipeline.

Additional Notes

Streaming Audio: Service streams audio in chunks for low-latency playback
Voice Selection: Choose voices based on your application’s tone and audience
Sample Rate Matching: Ensure sample rate matches your pipeline’s audio output sample rate

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

API Reference

Deepgram TTS Docs

Example Code

Installation

Frames

Input

Output

Voice Models

Supported Sample Rates

Integration with VAD

Using Silero VAD (Recommended)

Using Deepgram’s Built-in VAD Events

Usage Example

Basic Configuration

Custom Configuration

Dynamic Configuration

Metrics

Additional Notes

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

API Reference

Deepgram TTS Docs

Example Code

​Installation

​Frames

​Input

​Output

​Voice Models

​Supported Sample Rates

​Integration with VAD

​Using Silero VAD (Recommended)

​Using Deepgram’s Built-in VAD Events

​Usage Example

​Basic Configuration

​Custom Configuration

​Dynamic Configuration

​Metrics

​Additional Notes

Overview

Installation

Frames

Input

Output

Voice Models

Supported Sample Rates

Integration with VAD

Using Silero VAD (Recommended)

Using Deepgram’s Built-in VAD Events

Usage Example

Basic Configuration

Custom Configuration

Dynamic Configuration

Metrics

Additional Notes