Skip to main content

Overview

Google Cloud Text-to-Speech provides high-quality speech synthesis with two service implementations: GoogleTTSService (WebSocket-based) for streaming with the lowest latency, and GoogleHttpTTSService (HTTP-based) for simpler integration. GoogleTTSService is recommended for real-time applications.

Installation

To use Google services, install the required dependencies:
pip install "pipecat-ai[google]"

Prerequisites

Google Cloud Setup

Before using Google Cloud TTS services, you need:
  1. Google Cloud Account: Sign up at Google Cloud Console
  2. Project Setup: Create a project and enable the Text-to-Speech API
  3. Service Account: Create a service account with TTS permissions
  4. Authentication: Set up credentials via service account key or Application Default Credentials

Required Environment Variables

  • GOOGLE_APPLICATION_CREDENTIALS: Path to your service account key file (recommended)
  • Or use Application Default Credentials for cloud deployments

Configuration

GoogleTTSService

Streaming service optimized for Chirp 3 HD and Journey voices.
credentials
str
default:"None"
JSON string containing Google Cloud service account credentials.
credentials_path
str
default:"None"
Path to Google Cloud service account JSON file.
location
str
default:"None"
Google Cloud location for regional endpoint (e.g., "us-central1").
voice_id
str
default:"en-US-Chirp3-HD-Charon"
Google TTS voice identifier.
voice_cloning_key
str
default:"None"
Voice cloning key for Chirp 3 custom voices.
sample_rate
int
default:"None"
Output audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
params
InputParams
default:"InputParams()"
Language and speaking rate configuration. See GoogleTTSService InputParams below.

GoogleTTSService InputParams

ParameterTypeDefaultDescription
languageLanguageLanguage.ENLanguage for synthesis.
speaking_ratefloatNoneSpeaking rate in the range [0.25, 2.0].

GoogleHttpTTSService

HTTP service with full SSML support for all voice types.
credentials
str
default:"None"
JSON string containing Google Cloud service account credentials.
credentials_path
str
default:"None"
Path to Google Cloud service account JSON file.
location
str
default:"None"
Google Cloud location for regional endpoint.
voice_id
str
default:"en-US-Chirp3-HD-Charon"
Google TTS voice identifier.
sample_rate
int
default:"None"
Output audio sample rate in Hz.
params
InputParams
default:"None"
Voice customization parameters. See GoogleHttpTTSService InputParams below.

GoogleHttpTTSService InputParams

ParameterTypeDefaultDescription
pitchstrNoneVoice pitch adjustment (e.g., "+2st", "-50%").
ratestrNoneSpeaking rate for SSML prosody (non-Chirp voices, e.g., "slow", "fast", "125%").
speaking_ratefloatNoneSpeaking rate for AudioConfig (Chirp/Journey voices). Range [0.25, 2.0].
volumestrNoneVolume adjustment (e.g., "loud", "soft", "+6dB").
emphasisLiteralNoneEmphasis level: "strong", "moderate", "reduced", "none".
languageLanguageLanguage.ENLanguage for synthesis.
genderLiteralNoneVoice gender preference: "male", "female", "neutral".
google_styleLiteralNoneGoogle-specific voice style: "apologetic", "calm", "empathetic", "firm", "lively".

GeminiTTSService

Streaming service using Gemini’s TTS-specific models with natural voice control, prompts for style instructions, and multi-speaker support.
model
str
default:"gemini-2.5-flash-tts"
Gemini TTS model to use. Options: "gemini-2.5-flash-tts", "gemini-2.5-pro-tts".
credentials
str
default:"None"
JSON string containing Google Cloud service account credentials.
credentials_path
str
default:"None"
Path to Google Cloud service account JSON file.
location
str
default:"None"
Google Cloud location for regional endpoint.
voice_id
str
default:"Kore"
Voice name from available Gemini voices (e.g., "Kore", "Charon", "Puck", "Zephyr").
sample_rate
int
default:"None"
Output audio sample rate in Hz. Google TTS outputs at 24kHz; mismatched rates will produce a warning.
params
InputParams
default:"None"
TTS configuration parameters. See GeminiTTSService InputParams below.

GeminiTTSService InputParams

ParameterTypeDefaultDescription
languageLanguageLanguage.ENLanguage for synthesis.
promptstrNoneStyle instructions for how to synthesize the content.
multi_speakerboolFalseEnable multi-speaker support.
speaker_configslist[dict]NoneSpeaker configurations for multi-speaker mode. Each dict should have speaker_alias and optionally speaker_id.

Usage

Basic Setup (Streaming)

from pipecat.services.google import GoogleTTSService

tts = GoogleTTSService(
    credentials_path="/path/to/service-account.json",
    voice_id="en-US-Chirp3-HD-Charon",
)

HTTP Service with SSML

from pipecat.services.google import GoogleHttpTTSService
from pipecat.transcriptions.language import Language

tts = GoogleHttpTTSService(
    credentials_path="/path/to/service-account.json",
    voice_id="en-US-Standard-A",
    params=GoogleHttpTTSService.InputParams(
        language=Language.EN_US,
        rate="1.1",
        pitch="+2st",
    ),
)

Gemini TTS with Style Prompt

from pipecat.services.google import GeminiTTSService
from pipecat.transcriptions.language import Language

tts = GeminiTTSService(
    credentials_path="/path/to/service-account.json",
    model="gemini-2.5-flash-tts",
    voice_id="Kore",
    params=GeminiTTSService.InputParams(
        language=Language.EN_US,
        prompt="Say this in a friendly and helpful tone",
    ),
)

Notes

  • Streaming vs HTTP: GoogleTTSService uses the streaming API for low latency and only supports Chirp 3 HD and Journey voices. GoogleHttpTTSService supports all Google voices including Standard and WaveNet, with full SSML support.
  • Chirp/Journey voices and SSML: Chirp and Journey voices do not support SSML. The HTTP service automatically uses plain text input for these voices.
  • Speaking rate: For Chirp and Journey voices, use speaking_rate (float, 0.25-2.0) in InputParams. For other voices, use rate (string) for SSML prosody control.
  • Gemini TTS sample rate: Google TTS always outputs at 24kHz. Setting a different sample rate will produce a warning and may cause audio issues.
  • Gemini multi-speaker: Use multi_speaker=True with speaker_configs to generate conversations between multiple voices. Markup text with speaker aliases to control which voice speaks.