Skip to main content

Overview

Google Cloud Text-to-Speech provides high-quality speech synthesis with two service implementations: GoogleTTSService (WebSocket-based) for streaming with the lowest latency, and GoogleHttpTTSService (HTTP-based) for simpler integration. GoogleTTSService is recommended for real-time applications.

Installation

To use Google services, install the required dependencies:
pip install "pipecat-ai[google]"

Prerequisites

Google Cloud Setup

Before using Google Cloud TTS services, you need:
  1. Google Cloud Account: Sign up at Google Cloud Console
  2. Project Setup: Create a project and enable the Text-to-Speech API
  3. Service Account: Create a service account with TTS permissions
  4. Authentication: Set up credentials via service account key or Application Default Credentials

Required Environment Variables

  • GOOGLE_APPLICATION_CREDENTIALS: Path to your service account key file (recommended)
  • Or use Application Default Credentials for cloud deployments

Configuration

GoogleTTSService

Streaming service optimized for Chirp 3 HD and Journey voices.
credentials
str
default:"None"
JSON string containing Google Cloud service account credentials.
credentials_path
str
default:"None"
Path to Google Cloud service account JSON file.
location
str
default:"None"
Google Cloud location for regional endpoint (e.g., "us-central1").
voice_id
str
default:"en-US-Chirp3-HD-Charon"
deprecated
Google TTS voice identifier. Deprecated in v0.0.105. Use settings=GoogleTTSService.Settings(voice=...) instead.
voice_cloning_key
str
default:"None"
Voice cloning key for Chirp 3 custom voices.
sample_rate
int
default:"None"
Output audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
params
InputParams
default:"InputParams()"
deprecated
Deprecated in v0.0.105. Use settings=GoogleTTSService.Settings(...) instead.
settings
GoogleTTSService.Settings
default:"None"
Runtime-configurable settings. See GoogleTTSService Settings below.

GoogleTTSService Settings

Runtime-configurable settings passed via the settings constructor argument using GoogleTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneModel identifier. (Inherited.)
voicestrNoneVoice identifier. (Inherited.)
languageLanguage | strNoneLanguage for synthesis. (Inherited.)
speaking_ratefloatNOT_GIVENSpeaking rate in the range [0.25, 2.0].

GoogleHttpTTSService

HTTP service with full SSML support for all voice types.
credentials
str
default:"None"
JSON string containing Google Cloud service account credentials.
credentials_path
str
default:"None"
Path to Google Cloud service account JSON file.
location
str
default:"None"
Google Cloud location for regional endpoint.
voice_id
str
default:"en-US-Chirp3-HD-Charon"
deprecated
Google TTS voice identifier. Deprecated in v0.0.105. Use settings=GoogleHttpTTSService.Settings(voice=...) instead.
sample_rate
int
default:"None"
Output audio sample rate in Hz.
params
InputParams
default:"None"
deprecated
Deprecated in v0.0.105. Use settings=GoogleHttpTTSService.Settings(...) instead.
settings
GoogleHttpTTSService.Settings
default:"None"
Runtime-configurable settings. See GoogleHttpTTSService Settings below.

GoogleHttpTTSService Settings

Runtime-configurable settings passed via the settings constructor argument using GoogleHttpTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneModel identifier. (Inherited.)
voicestrNoneVoice identifier. (Inherited.)
languageLanguage | strNoneLanguage for synthesis. (Inherited.)
pitchstrNOT_GIVENVoice pitch adjustment (e.g., "+2st", "-50%").
ratestrNOT_GIVENSpeaking rate for SSML prosody (non-Chirp voices, e.g., "slow", "fast", "125%").
speaking_ratefloatNOT_GIVENSpeaking rate for AudioConfig (Chirp/Journey voices). Range [0.25, 2.0].
volumestrNOT_GIVENVolume adjustment (e.g., "loud", "soft", "+6dB").
emphasisLiteralNOT_GIVENEmphasis level: "strong", "moderate", "reduced", "none".
genderLiteralNOT_GIVENVoice gender preference: "male", "female", "neutral".
google_styleLiteralNOT_GIVENGoogle-specific voice style: "apologetic", "calm", "empathetic", "firm", "lively".

GeminiTTSService

Streaming service using Gemini’s TTS-specific models with natural voice control, prompts for style instructions, and multi-speaker support.
model
str
default:"gemini-2.5-flash-tts"
deprecated
Gemini TTS model to use. Options: "gemini-2.5-flash-tts", "gemini-2.5-pro-tts". Deprecated in v0.0.105. Use settings=GeminiTTSService.Settings(model=...) instead.
credentials
str
default:"None"
JSON string containing Google Cloud service account credentials.
credentials_path
str
default:"None"
Path to Google Cloud service account JSON file.
location
str
default:"None"
Google Cloud location for regional endpoint.
voice_id
str
default:"Kore"
deprecated
Voice name from available Gemini voices (e.g., "Kore", "Charon", "Puck", "Zephyr"). Deprecated in v0.0.105. Use settings=GeminiTTSService.Settings(voice=...) instead.
sample_rate
int
default:"None"
Output audio sample rate in Hz. Google TTS outputs at 24kHz; mismatched rates will produce a warning.
params
InputParams
default:"None"
deprecated
Deprecated in v0.0.105. Use settings=GeminiTTSService.Settings(...) instead.
settings
GeminiTTSService.Settings
default:"None"
Runtime-configurable settings. See GeminiTTSService Settings below.

GeminiTTSService Settings

Runtime-configurable settings passed via the settings constructor argument using GeminiTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneModel identifier. (Inherited.)
voicestrNoneVoice identifier. (Inherited.)
languageLanguage | strNoneLanguage for synthesis. (Inherited.)
promptstrNOT_GIVENStyle instructions for how to synthesize the content.
multi_speakerboolNOT_GIVENEnable multi-speaker support.
speaker_configslist[dict]NOT_GIVENSpeaker configurations for multi-speaker mode. Each dict should have speaker_alias and optionally speaker_id.

Usage

Basic Setup (Streaming)

from pipecat.services.google import GoogleTTSService

tts = GoogleTTSService(
    credentials_path="/path/to/service-account.json",
    settings=GoogleTTSService.Settings(
        voice="en-US-Chirp3-HD-Charon",
    ),
)

HTTP Service with SSML

from pipecat.services.google import GoogleHttpTTSService
from pipecat.transcriptions.language import Language

tts = GoogleHttpTTSService(
    credentials_path="/path/to/service-account.json",
    settings=GoogleHttpTTSService.Settings(
        voice="en-US-Standard-A",
        language=Language.EN_US,
        rate="1.1",
        pitch="+2st",
    ),
)

Gemini TTS with Style Prompt

from pipecat.services.google import GeminiTTSService
from pipecat.transcriptions.language import Language

tts = GeminiTTSService(
    credentials_path="/path/to/service-account.json",
    settings=GeminiTTSService.Settings(
        model="gemini-2.5-flash-tts",
        voice="Kore",
        language=Language.EN_US,
        prompt="Say this in a friendly and helpful tone",
    ),
)
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Notes

  • Streaming vs HTTP: GoogleTTSService uses the streaming API for low latency and only supports Chirp 3 HD and Journey voices. GoogleHttpTTSService supports all Google voices including Standard and WaveNet, with full SSML support.
  • Chirp/Journey voices and SSML: Chirp and Journey voices do not support SSML. The HTTP service automatically uses plain text input for these voices.
  • Speaking rate: For Chirp and Journey voices, use speaking_rate (float, 0.25-2.0) in settings. For other voices, use rate (string) for SSML prosody control.
  • Gemini TTS sample rate: Google TTS always outputs at 24kHz. Setting a different sample rate will produce a warning and may cause audio issues.
  • Gemini multi-speaker: Use multi_speaker=True with speaker_configs to generate conversations between multiple voices. Markup text with speaker aliases to control which voice speaks.