> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Google

> Speech-to-text service implementation using Google Cloud's Speech-to-Text V2 API

## Overview

`GoogleSTTService` provides real-time speech recognition using Google Cloud's Speech-to-Text V2 API with support for 125+ languages, multiple models, voice activity detection, and advanced features like automatic punctuation and word-level confidence scores.

<CardGroup cols={2}>
  <Card title="Google STT API Reference" icon="code" href="https://reference-server.pipecat.ai/en/latest/api/pipecat.services.google.stt.html">
    Pipecat's API methods for Google Cloud STT integration
  </Card>

  <Card title="Example Implementation" icon="play" href="https://github.com/pipecat-ai/pipecat/blob/main/examples/voice/voice-google.py">
    Complete example with Google Cloud services
  </Card>

  <Card title="Google Cloud Documentation" icon="book" href="https://cloud.google.com/speech-to-text/v2/docs">
    Official Google Cloud Speech-to-Text documentation
  </Card>

  <Card title="Google Cloud Console" icon="microphone" href="https://console.cloud.google.com/iam-admin/serviceaccounts">
    Create service accounts and manage API access
  </Card>
</CardGroup>

## Installation

To use Google Cloud Speech services, install the required dependency:

```bash theme={null}
uv add "pipecat-ai[google]"
```

## Prerequisites

### Google Cloud Setup

Before using Google Cloud STT services, you need:

1. **Google Cloud Account**: Sign up at [Google Cloud Console](https://console.cloud.google.com/)
2. **Project Setup**: Create a project and enable the Speech-to-Text API
3. **Service Account**: Create a service account with Speech-to-Text permissions
4. **Authentication**: Set up credentials via service account key or Application Default Credentials

### Required Environment Variables

* `GOOGLE_APPLICATION_CREDENTIALS`: Path to your service account key file (recommended)
* Or use Application Default Credentials for cloud deployments

## Configuration

### GoogleSTTService

<ParamField path="credentials" type="str" default="None">
  JSON string containing Google Cloud service account credentials.
</ParamField>

<ParamField path="credentials_path" type="str" default="None">
  Path to service account credentials JSON file.
</ParamField>

<ParamField path="location" type="str" default="global">
  Google Cloud location (e.g., `"global"`, `"us-central1"`). Non-global
  locations use regional endpoints.
</ParamField>

<ParamField path="sample_rate" type="int" default="None">
  Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
  rate.
</ParamField>

<ParamField path="params" type="GoogleSTTService.InputParams" default="None" deprecated>
  Configuration parameters for the STT service. *Deprecated in v0.0.105. Use
  `settings=GoogleSTTService.Settings(...)` instead.*
</ParamField>

<ParamField path="settings" type="GoogleSTTService.Settings" default="None">
  Runtime-configurable settings for the STT service. See [Settings](#settings)
  below.
</ParamField>

<ParamField path="ttfs_p99_latency" type="float" default="GOOGLE_TTFS_P99">
  P99 latency from speech end to final transcript in seconds. Override for your
  deployment.
</ParamField>

<Note>
  You must provide either `credentials` (JSON string), `credentials_path` (file
  path), or have Application Default Credentials configured. At least one
  authentication method is required.
</Note>

### Settings

Runtime-configurable settings passed via the `settings` constructor argument using `GoogleSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.

| Parameter                              | Type                         | Default            | Description                                                                  |
| -------------------------------------- | ---------------------------- | ------------------ | ---------------------------------------------------------------------------- |
| `model`                                | `str`                        | `"latest_long"`    | Speech recognition model to use. *(Inherited from base STT settings.)*       |
| `language`                             | `Language \| str`            | `None`             | Language for speech recognition. *(Inherited from base STT settings.)*       |
| `languages`                            | `Language \| List[Language]` | `[Language.EN_US]` | Single language or list of recognition languages. First language is primary. |
| `use_separate_recognition_per_channel` | `bool`                       | `False`            | Process each audio channel separately.                                       |
| `enable_automatic_punctuation`         | `bool`                       | `True`             | Add punctuation to transcripts.                                              |
| `enable_spoken_punctuation`            | `bool`                       | `False`            | Include spoken punctuation in transcript.                                    |
| `enable_spoken_emojis`                 | `bool`                       | `False`            | Include spoken emojis in transcript.                                         |
| `profanity_filter`                     | `bool`                       | `False`            | Filter profanity from transcript.                                            |
| `enable_word_time_offsets`             | `bool`                       | `False`            | Include timing information for each word.                                    |
| `enable_word_confidence`               | `bool`                       | `False`            | Include confidence scores for each word.                                     |
| `enable_interim_results`               | `bool`                       | `True`             | Stream partial recognition results.                                          |
| `enable_voice_activity_events`         | `bool`                       | `False`            | Detect voice activity in audio.                                              |

## Usage

### Basic Setup

```python theme={null}
from pipecat.services.google.stt import GoogleSTTService

stt = GoogleSTTService(
    credentials_path=os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
)
```

### With Credentials JSON String

```python theme={null}
import json
from pipecat.services.google.stt import GoogleSTTService

stt = GoogleSTTService(
    credentials=json.dumps(credentials_dict),
    location="us-central1",
)
```

### With Custom Parameters

```python theme={null}
from pipecat.services.google.stt import GoogleSTTService
from pipecat.transcriptions.language import Language

stt = GoogleSTTService(
    credentials_path=os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
    settings=GoogleSTTService.Settings(
        languages=[Language.EN_US, Language.ES],
        model="latest_long",
        enable_automatic_punctuation=True,
        enable_word_time_offsets=True,
        enable_word_confidence=True,
    ),
)
```

### Updating Settings at Runtime

Google STT supports dynamic settings updates via `STTUpdateSettingsFrame`:

```python theme={null}
from pipecat.frames.frames import STTUpdateSettingsFrame
from pipecat.transcriptions.language import Language

await task.queue_frame(
    STTUpdateSettingsFrame(
        delta=GoogleSTTService.Settings(
            languages=[Language.FR],
            model="latest_short",
            enable_automatic_punctuation=False,
        )
    )
)
```

## Notes

* **Streaming time limit**: Google Cloud STT has a 5-minute streaming limit per connection. The service automatically handles stream reconnection at 4 minutes to provide seamless transcription without interruption.
* **Multi-language support**: Pass a list of `Language` values to `languages` for multi-language recognition. The first language is the primary language.
* **Regional endpoints**: Use the `location` parameter to route requests through regional endpoints (e.g., `"us-central1"`, `"europe-west1"`) for data residency requirements. The default `"global"` endpoint works for most use cases.
* **Stream abort on inactivity**: If no audio is sent for \~10 seconds (e.g., when audio frames are blocked), Google automatically closes the stream. The service recovers by automatically reconnecting.
* **Authentication priority**: The service checks for credentials in this order: `credentials` (JSON string), `credentials_path` (file), then Application Default Credentials.

<Tip>
  The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
  `Settings` / `settings=` instead. See the [Service Settings
  guide](/pipecat/fundamentals/service-settings) for migration details.
</Tip>

## Event Handlers

Google STT supports the standard [service connection events](/api-reference/server/events/service-events):

| Event             | Description                                   |
| ----------------- | --------------------------------------------- |
| `on_connected`    | Connected to Google Cloud Speech-to-Text      |
| `on_disconnected` | Disconnected from Google Cloud Speech-to-Text |

```python theme={null}
@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to Google STT")
```
