Overview

Google Cloud Text-to-Speech provides high-quality speech synthesis with two implementations:
  • GoogleTTSService: Websocket-based streaming service
  • GoogleHttpTTSService: HTTP-based streaming service
GoogleTTSService offers the lowest latency and is the recommended option.

Installation

To use Google services, install the required dependencies:
pip install "pipecat-ai[google]"
You’ll need to set up Google Cloud credentials through one of these methods:
  • Environment variable: GOOGLE_APPLICATION_CREDENTIALS (path to service account JSON)
  • Service account JSON string
  • Service account file path
Create a service account in the Google Cloud Console with Text-to-Speech API permissions.

Frames

Input

  • TextFrame - Text content to synthesize into speech
  • TTSSpeakFrame - Text that should be spoken immediately
  • TTSUpdateSettingsFrame - Runtime configuration updates
  • LLMFullResponseStartFrame / LLMFullResponseEndFrame - LLM response boundaries

Output

  • TTSStartedFrame - Signals start of synthesis
  • TTSAudioRawFrame - Generated audio data (PCM format)
  • TTSStoppedFrame - Signals completion of synthesis
  • ErrorFrame - Google Cloud API or processing errors

Service Comparison

FeatureGoogleTTSService (Streaming)GoogleHttpTTSService (HTTP)
Streaming✅ Real-time chunks❌ Single audio block
Latency🚀 Ultra-low📈 Higher
Voice SupportChirp 3 HD, Journey onlyAll Google voices
SSML Support❌ Plain text only✅ Full SSML
Customization⚠️ Basic✅ Extensive

Language Support

Common languages supported include:
  • Language.EN_US - English (US)
  • Language.EN_GB - English (UK)
  • Language.FR - French
  • Language.DE - German
  • Language.ES - Spanish
  • Language.IT - Italian

Credential Setup

Environment Variable Method

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"

Direct Credentials

# Using credentials string
tts = GoogleTTSService(
    credentials='{"type": "service_account", "project_id": "...", ...}'
)

# Using credentials file path
tts = GoogleTTSService(
    credentials_path="/path/to/service-account.json"
)

Usage Example

Initialize GoogleTTSService and use it in a pipeline:
from pipecat.services.google.tts import GoogleTTSService
from pipecat.transcriptions.language import Language
import os

# Configure streaming service with Chirp 3 HD
tts = GoogleTTSService(
    credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
    voice_id="en-US-Chirp3-HD-Charon",
    params=GoogleTTSService.InputParams(
        language=Language.EN_US
    )
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

HTTP Service (Full SSML Support)

Initialize GoogleHttpTTSService for more customization options:
from pipecat.services.google.tts import GoogleHttpTTSService

# Configure HTTP service with SSML customization
http_tts = GoogleHttpTTSService(
    credentials_path="/path/to/service-account.json",
    voice_id="en-US-Neural2-A",
    params=GoogleHttpTTSService.InputParams(
        language=Language.EN_US,
        pitch="+2st",
        rate="1.2",
        volume="loud",
        emphasis="moderate",
        google_style="empathetic"
    )
)

Dynamic Configuration

Make settings updates by pushing a TTSUpdateSettingsFrame:
from pipecat.frames.frames import TTSUpdateSettingsFrame

await task.queue_frame(TTSUpdateSettingsFrame(
    voice_id="new-voice-id",
  )
)

Metrics

Both services provide comprehensive metrics:
  • Time to First Byte (TTFB) - Latency from text input to first audio
  • Processing Duration - Total synthesis time
  • Character Usage - Text processed for billing
Learn how to enable Metrics in your Pipeline.

Additional Notes

  • Voice Compatibility: Streaming service only supports Chirp 3 HD and Journey voices
  • SSML Limitations: Chirp and Journey voices don’t support SSML - use plain text input
  • Credential Management: Supports multiple authentication methods for flexibility
  • Regional Voices: Match voice selection with language code for optimal results
  • Streaming Advantage: Use streaming service for conversational AI requiring ultra-low latency
  • HTTP Advantage: Use HTTP service when you need extensive voice customization via SSML