Skip to main content

Overview

GoogleSTTService provides real-time speech recognition using Google Cloud’s Speech-to-Text V2 API with support for 125+ languages, multiple models, voice activity detection, and advanced features like automatic punctuation and word-level confidence scores.

Installation

To use Google Cloud Speech services, install the required dependency:
pip install "pipecat-ai[google]"

Prerequisites

Google Cloud Setup

Before using Google Cloud STT services, you need:
  1. Google Cloud Account: Sign up at Google Cloud Console
  2. Project Setup: Create a project and enable the Speech-to-Text API
  3. Service Account: Create a service account with Speech-to-Text permissions
  4. Authentication: Set up credentials via service account key or Application Default Credentials

Required Environment Variables

  • GOOGLE_APPLICATION_CREDENTIALS: Path to your service account key file (recommended)
  • Or use Application Default Credentials for cloud deployments

Configuration

GoogleSTTService

credentials
str
default:"None"
JSON string containing Google Cloud service account credentials.
credentials_path
str
default:"None"
Path to service account credentials JSON file.
location
str
default:"global"
Google Cloud location (e.g., "global", "us-central1"). Non-global locations use regional endpoints.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
params
InputParams
default:"None"
Configuration parameters for the STT service. See InputParams below.
ttfs_p99_latency
float
default:"GOOGLE_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment.
You must provide either credentials (JSON string), credentials_path (file path), or have Application Default Credentials configured. At least one authentication method is required.

InputParams

Parameters passed via the params constructor argument.
ParameterTypeDefaultDescription
languagesLanguage | List[Language][Language.EN_US]Single language or list of recognition languages. First language is primary.
modelstr"latest_long"Speech recognition model to use.
use_separate_recognition_per_channelboolFalseProcess each audio channel separately.
enable_automatic_punctuationboolTrueAdd punctuation to transcripts.
enable_spoken_punctuationboolFalseInclude spoken punctuation in transcript.
enable_spoken_emojisboolFalseInclude spoken emojis in transcript.
profanity_filterboolFalseFilter profanity from transcript.
enable_word_time_offsetsboolFalseInclude timing information for each word.
enable_word_confidenceboolFalseInclude confidence scores for each word.
enable_interim_resultsboolTrueStream partial recognition results.
enable_voice_activity_eventsboolFalseDetect voice activity in audio.

Usage

Basic Setup

from pipecat.services.google import GoogleSTTService

stt = GoogleSTTService(
    credentials_path=os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
)

With Credentials JSON String

import json
from pipecat.services.google import GoogleSTTService

stt = GoogleSTTService(
    credentials=json.dumps(credentials_dict),
    location="us-central1",
)

With Custom Parameters

from pipecat.services.google import GoogleSTTService
from pipecat.transcriptions.language import Language

stt = GoogleSTTService(
    credentials_path=os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
    params=GoogleSTTService.InputParams(
        languages=[Language.EN_US, Language.ES],
        model="latest_long",
        enable_automatic_punctuation=True,
        enable_word_time_offsets=True,
        enable_word_confidence=True,
    ),
)

Updating Options at Runtime

Google STT supports dynamic option updates via the update_options method:
await stt.update_options(
    languages=[Language.FR],
    model="latest_short",
    enable_automatic_punctuation=False,
)

Notes

  • Streaming time limit: Google Cloud STT has a 5-minute streaming limit per connection. The service automatically handles stream reconnection at 4 minutes to provide seamless transcription without interruption.
  • Multi-language support: Pass a list of Language values to languages for multi-language recognition. The first language is the primary language.
  • Regional endpoints: Use the location parameter to route requests through regional endpoints (e.g., "us-central1", "europe-west1") for data residency requirements. The default "global" endpoint works for most use cases.
  • Stream abort on inactivity: If no audio is sent for ~10 seconds (e.g., when audio frames are blocked by an STTMuteFilter), Google automatically closes the stream. The service recovers by automatically reconnecting.
  • Authentication priority: The service checks for credentials in this order: credentials (JSON string), credentials_path (file), then Application Default Credentials.

Event Handlers

Google STT supports the standard service connection events:
EventDescription
on_connectedConnected to Google Cloud Speech-to-Text
on_disconnectedDisconnected from Google Cloud Speech-to-Text
@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to Google STT")