Overview

Deepgram’s Aura API provides high-quality text-to-speech synthesis with streaming capabilities and ultra-low latency. The service offers various voice models optimized for conversational AI applications with efficient audio streaming.

Installation

To use Deepgram services, install the required dependencies:
pip install "pipecat-ai[deepgram]"
You’ll also need to set up your Deepgram API key as an environment variable: DEEPGRAM_API_KEY.
Get your API key from the Deepgram Console.

Frames

Input

  • TextFrame - Text content to synthesize into speech
  • TTSSpeakFrame - Text that should be spoken immediately
  • TTSUpdateSettingsFrame - Runtime configuration updates
  • LLMFullResponseStartFrame / LLMFullResponseEndFrame - LLM response boundaries

Output

  • TTSStartedFrame - Signals start of synthesis
  • TTSAudioRawFrame - Generated audio data chunks (streaming)
  • TTSStoppedFrame - Signals completion of synthesis
  • ErrorFrame - API or processing errors

Voice Models

Deepgram offers various Aura voice models optimized for different use cases. Here are some highlights:
Voice ModelDescriptionLanguage
aura-2-helena-enNatural female voiceEnglish
aura-2-andromeda-enExpressive female voiceEnglish
aura-helios-enWarm male voiceEnglish
aura-luna-enConversational female voiceEnglish
aura-stella-enProfessional female voiceEnglish
aura-zeus-enAuthoritative male voiceEnglish
Deepgram regularly adds new voice models. Check the official documentation for the latest available voices.

Supported Sample Rates

  • 8000 Hz - Phone quality
  • 16000 Hz - Standard quality
  • 24000 Hz - High quality (default)
  • 44100 Hz - CD quality
  • 48000 Hz - Professional quality

Integration with VAD

Deepgram TTS works seamlessly with Voice Activity Detection:
from pipecat.audio.vad.silero import SileroVADAnalyzer

transport_params = DailyParams(
    audio_in_enabled=True,
    audio_out_enabled=True,
    vad_analyzer=SileroVADAnalyzer()
)

Using Deepgram’s Built-in VAD Events

from deepgram import LiveOptions

stt = DeepgramSTTService(
    api_key=os.getenv("DEEPGRAM_API_KEY"),
    live_options=LiveOptions(
        vad_events=True,
        utterance_end_ms="1000"
    )
)

@stt.event_handler("on_speech_started")
async def on_speech_started(stt, *args, **kwargs):
    await task.queue_frames([BotInterruptionFrame()])

@stt.event_handler("on_utterance_end")
async def on_utterance_end(stt, *args, **kwargs):
    await task.queue_frames([StopInterruptionFrame()])

Usage Example

Basic Configuration

Initialize the DeepgramTTSService with your API key and use it in your pipeline:
from pipecat.services.deepgram.tts import DeepgramTTSService
import os

# Configure service
tts = DeepgramTTSService(
    api_key=os.getenv("DEEPGRAM_API_KEY"),
    voice="aura-2-andromeda-en",
    sample_rate=24000,
    encoding="linear16"
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

Custom Configuration

# Advanced configuration with custom settings
tts = DeepgramTTSService(
    api_key=os.getenv("DEEPGRAM_API_KEY"),
    voice="aura-helios-en",        # Male voice
    base_url="https://api.deepgram.com",  # Custom endpoint
    sample_rate=48000,             # High-quality audio
    encoding="linear16"
)

Dynamic Configuration

Make settings updates by pushing a TTSUpdateSettingsFrame for the DeepgramTTSService:
from pipecat.frames.frames import TTSUpdateSettingsFrame

await task.queue_frame(TTSUpdateSettingsFrame(
    voice="aura-luna-en",  # Change to a different voice
  )
)

Metrics

The service provides comprehensive metrics:
  • Time to First Byte (TTFB) - Latency from text input to first audio
  • Processing Duration - Total synthesis time
  • Character Usage - Text processed for billing
Learn how to enable Metrics in your Pipeline.

Additional Notes

  • Streaming Audio: Service streams audio in chunks for low-latency playback
  • Voice Selection: Choose voices based on your application’s tone and audience
  • Sample Rate Matching: Ensure sample rate matches your pipeline’s audio output sample rate