Overview
Google Cloud Text-to-Speech provides high-quality speech synthesis with two service implementations:GoogleTTSService (WebSocket-based) for streaming with the lowest latency, and GoogleHttpTTSService (HTTP-based) for simpler integration. GoogleTTSService is recommended for real-time applications.
Google TTS API Reference
Pipecat’s API methods for Google Cloud TTS integration
Example Implementation
Complete example with Chirp 3 HD voice
Google Cloud Documentation
Official Google Cloud Text-to-Speech documentation
Voice Gallery
Browse available voices and languages
Installation
To use Google services, install the required dependencies:Prerequisites
Google Cloud Setup
Before using Google Cloud TTS services, you need:- Google Cloud Account: Sign up at Google Cloud Console
- Project Setup: Create a project and enable the Text-to-Speech API
- Service Account: Create a service account with TTS permissions
- Authentication: Set up credentials via service account key or Application Default Credentials
Required Environment Variables
GOOGLE_APPLICATION_CREDENTIALS: Path to your service account key file (recommended)- Or use Application Default Credentials for cloud deployments
Configuration
GoogleTTSService
Streaming service optimized for Chirp 3 HD and Journey voices.JSON string containing Google Cloud service account credentials.
Path to Google Cloud service account JSON file.
Google Cloud location for regional endpoint (e.g.,
"us-central1").Google TTS voice identifier.
Voice cloning key for Chirp 3 custom voices.
Output audio sample rate in Hz. When
None, uses the pipeline’s configured sample rate.Language and speaking rate configuration. See GoogleTTSService InputParams below.
GoogleTTSService InputParams
| Parameter | Type | Default | Description |
|---|---|---|---|
language | Language | Language.EN | Language for synthesis. |
speaking_rate | float | None | Speaking rate in the range [0.25, 2.0]. |
GoogleHttpTTSService
HTTP service with full SSML support for all voice types.JSON string containing Google Cloud service account credentials.
Path to Google Cloud service account JSON file.
Google Cloud location for regional endpoint.
Google TTS voice identifier.
Output audio sample rate in Hz.
Voice customization parameters. See GoogleHttpTTSService InputParams below.
GoogleHttpTTSService InputParams
| Parameter | Type | Default | Description |
|---|---|---|---|
pitch | str | None | Voice pitch adjustment (e.g., "+2st", "-50%"). |
rate | str | None | Speaking rate for SSML prosody (non-Chirp voices, e.g., "slow", "fast", "125%"). |
speaking_rate | float | None | Speaking rate for AudioConfig (Chirp/Journey voices). Range [0.25, 2.0]. |
volume | str | None | Volume adjustment (e.g., "loud", "soft", "+6dB"). |
emphasis | Literal | None | Emphasis level: "strong", "moderate", "reduced", "none". |
language | Language | Language.EN | Language for synthesis. |
gender | Literal | None | Voice gender preference: "male", "female", "neutral". |
google_style | Literal | None | Google-specific voice style: "apologetic", "calm", "empathetic", "firm", "lively". |
GeminiTTSService
Streaming service using Gemini’s TTS-specific models with natural voice control, prompts for style instructions, and multi-speaker support.Gemini TTS model to use. Options:
"gemini-2.5-flash-tts", "gemini-2.5-pro-tts".JSON string containing Google Cloud service account credentials.
Path to Google Cloud service account JSON file.
Google Cloud location for regional endpoint.
Voice name from available Gemini voices (e.g.,
"Kore", "Charon", "Puck", "Zephyr").Output audio sample rate in Hz. Google TTS outputs at 24kHz; mismatched rates will produce a warning.
TTS configuration parameters. See GeminiTTSService InputParams below.
GeminiTTSService InputParams
| Parameter | Type | Default | Description |
|---|---|---|---|
language | Language | Language.EN | Language for synthesis. |
prompt | str | None | Style instructions for how to synthesize the content. |
multi_speaker | bool | False | Enable multi-speaker support. |
speaker_configs | list[dict] | None | Speaker configurations for multi-speaker mode. Each dict should have speaker_alias and optionally speaker_id. |
Usage
Basic Setup (Streaming)
HTTP Service with SSML
Gemini TTS with Style Prompt
Notes
- Streaming vs HTTP:
GoogleTTSServiceuses the streaming API for low latency and only supports Chirp 3 HD and Journey voices.GoogleHttpTTSServicesupports all Google voices including Standard and WaveNet, with full SSML support. - Chirp/Journey voices and SSML: Chirp and Journey voices do not support SSML. The HTTP service automatically uses plain text input for these voices.
- Speaking rate: For Chirp and Journey voices, use
speaking_rate(float, 0.25-2.0) inInputParams. For other voices, userate(string) for SSML prosody control. - Gemini TTS sample rate: Google TTS always outputs at 24kHz. Setting a different sample rate will produce a warning and may cause audio issues.
- Gemini multi-speaker: Use
multi_speaker=Truewithspeaker_configsto generate conversations between multiple voices. Markup text with speaker aliases to control which voice speaks.