Overview

GoogleTTSService provides high-quality text-to-speech synthesis using Google Cloud’s Text-to-Speech API. It supports SSML for advanced voice control and multiple languages.

Installation

To use GoogleTTSService, install the required dependencies:

pip install pipecat-ai[google]

You’ll also need to set up Google Cloud credentials through either:

  • Environment variable: GOOGLE_APPLICATION_CREDENTIALS
  • Direct credentials JSON
  • Credentials file path

Configuration

Constructor Parameters

credentials
str | None

Google Cloud credentials JSON string

credentials_path
str | None

Path to credentials JSON file

voice_id
str
default: "en-US-Neural2-A"

Voice identifier

sample_rate
int
default: "24000"

Output audio sample rate in Hz

text_filter
BaseTextFilter
default: "None"

Modifies text provided to the TTS. Learn more about the available filters.

Input Parameters

class InputParams(BaseModel):
    pitch: Optional[str]
    rate: Optional[str]
    volume: Optional[str]
    emphasis: Optional[Literal["strong", "moderate", "reduced", "none"]]
    language: Optional[Language] = Language.EN
    gender: Optional[Literal["male", "female", "neutral"]]
    google_style: Optional[Literal["apologetic", "calm", "empathetic", "firm", "lively"]]

Output Frames

Control Frames

TTSStartedFrame
Frame

Signals start of synthesis

TTSStoppedFrame
Frame

Signals completion of synthesis

Audio Frames

TTSAudioRawFrame
Frame

Contains generated audio data: - PCM encoded audio - Configured sample rate - Mono channel

Error Frames

ErrorFrame
Frame

Contains error information

Usage Examples

Basic Usage

# Configure service
tts_service = GoogleTTSService(
    credentials_path="path/to/credentials.json",
    voice_id="en-US-Neural2-A",
    params=GoogleTTSService.InputParams(
        language=Language.EN,
        gender="female",
        google_style="empathetic"
    )
)

# Use in pipeline
pipeline = Pipeline([
    text_input,
    tts_service,
    audio_output
])

With SSML Controls

# Configure with voice controls
service = GoogleTTSService(
    credentials=credentials_json,
    params=GoogleTTSService.InputParams(
        pitch="+2st",
        rate="1.2",
        volume="loud",
        emphasis="moderate"
    )
)

Methods

See the TTS base class methods for additional functionality.

Language Support

Google Cloud Text-to-Speech supports the following languages and regional variants:

Language CodeDescriptionService Code
Language.BGBulgarianbg-BG
Language.CACatalanca-ES
Language.ZHChinese (Mandarin)cmn-CN
Language.ZH_TWChinese (Taiwan)cmn-TW
Language.CSCzechcs-CZ
Language.DADanishda-DK
Language.NLDutch (Netherlands)nl-NL
Language.NL_BEDutch (Belgium)nl-BE
Language.ENEnglish (US)en-US
Language.EN_USEnglish (US)en-US
Language.EN_AUEnglish (Australia)en-AU
Language.EN_GBEnglish (UK)en-GB
Language.EN_INEnglish (India)en-IN
Language.ETEstonianet-EE
Language.FIFinnishfi-FI
Language.FRFrench (France)fr-FR
Language.FR_CAFrench (Canada)fr-CA
Language.DEGermande-DE
Language.ELGreekel-GR
Language.HIHindihi-IN
Language.HUHungarianhu-HU
Language.IDIndonesianid-ID
Language.ITItalianit-IT
Language.JAJapaneseja-JP
Language.KOKoreanko-KR
Language.LVLatvianlv-LV
Language.LTLithuanianlt-LT
Language.MSMalayms-MY
Language.NONorwegiannb-NO
Language.PLPolishpl-PL
Language.PTPortuguese (Portugal)pt-PT
Language.PT_BRPortuguese (Brazil)pt-BR
Language.RORomanianro-RO
Language.RURussianru-RU
Language.SKSlovaksk-SK
Language.ESSpanishes-ES
Language.SVSwedishsv-SE
Language.THThaith-TH
Language.TRTurkishtr-TR
Language.UKUkrainianuk-UA
Language.VIVietnamesevi-VN

Usage Example

# Configure service with specific language and region
service = GoogleTTSService(
    credentials_path="path/to/credentials.json",
    voice_id="en-US-Neural2-A",
    params=GoogleTTSService.InputParams(
        language=Language.EN_GB,  # British English
        gender="female"
    )
)

Regional Considerations

  • Each language code includes both language and region (e.g., fr-FR for French in France)
  • Some languages have multiple regional variants (e.g., English has US, UK, Australian, and Indian variants)
  • Voice availability may vary by region
  • Neural voices may not be available for all language/region combinations

Note: Voice selection should match the specified language code for optimal results.

Frame Flow

Notes

  • Supports SSML markup
  • Multiple voice styles
  • Gender selection
  • Prosody control
  • Emphasis levels
  • Regional language variants
  • Metrics collection
  • Chunked audio output
  • Thread-safe processing