> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Google Gemini

> Large Language Model service implementation using Google's Gemini API

## Overview

`GoogleLLMService` provides integration with Google's Gemini models, supporting streaming responses, function calling, and multimodal inputs. It includes specialized context handling for Google's message format while maintaining compatibility with OpenAI-style contexts.

<CardGroup cols={2}>
  <Card title="Gemini LLM API Reference" icon="code" href="https://reference-server.pipecat.ai/en/latest/api/pipecat.services.google.llm.html">
    Pipecat's API methods for Google Gemini integration
  </Card>

  <Card title="Example Implementation" icon="play" href="https://github.com/pipecat-ai/pipecat/blob/main/examples/function-calling/function-calling-google.py">
    Complete example with function calling
  </Card>

  <Card title="Gemini Documentation" icon="book" href="https://ai.google.dev/gemini-api/docs">
    Official Google Gemini API documentation and features
  </Card>

  <Card title="Google AI Studio" icon="microphone" href="https://aistudio.google.com/">
    Access Gemini models and manage API keys
  </Card>
</CardGroup>

## Installation

To use Google Gemini services, install the required dependencies:

```bash theme={null}
uv add "pipecat-ai[google]"
```

## Prerequisites

### Google Gemini Setup

Before using Google Gemini LLM services, you need:

1. **Google Account**: Sign up at [Google AI Studio](https://aistudio.google.com/)
2. **API Key**: Generate a Gemini API key from AI Studio
3. **Model Selection**: Choose from available Gemini models (Gemini 2.5 Flash, Gemini 2.5 Pro, etc.)

### Required Environment Variables

* `GOOGLE_API_KEY`: Your Google Gemini API key for authentication

## Configuration

<ParamField path="api_key" type="str" required>
  Google AI API key for authentication.
</ParamField>

<ParamField path="model" type="str" default="None" deprecated>
  Gemini model name to use (e.g., `"gemini-2.5-flash"`, `"gemini-2.5-pro"`).
  *Deprecated in v0.0.105. Use `settings=GoogleLLMService.Settings(...)`
  instead.*
</ParamField>

<ParamField path="settings" type="GoogleLLMService.Settings" default="None">
  Runtime-configurable model settings. See [Settings](#settings) below.
</ParamField>

<ParamField path="params" type="InputParams" default="None" deprecated>
  Runtime-configurable model settings. See [Settings](#settings) below.
  *Deprecated in v0.0.105. Use `settings=GoogleLLMService.Settings(...)`
  instead.*
</ParamField>

<ParamField path="system_instruction" type="str" default="None" deprecated>
  System instruction/prompt for the model. Sets the overall behavior and
  context. *Deprecated in v0.0.105. Use
  `settings=GoogleLLMService.Settings(system_instruction=...)` instead.*
</ParamField>

<ParamField path="tools" type="List[Dict[str, Any]]" default="None">
  List of available tools/functions for the model to call.
</ParamField>

<ParamField path="tool_config" type="Dict[str, Any]" default="None">
  Configuration for tool usage behavior.
</ParamField>

<ParamField path="http_options" type="HttpOptions" default="None">
  HTTP options for the Google API client.
</ParamField>

### Settings

Runtime-configurable settings passed via the `settings` constructor argument using `GoogleLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.

| Parameter            | Type                   | Default     | Description                                                                                        |
| -------------------- | ---------------------- | ----------- | -------------------------------------------------------------------------------------------------- |
| `model`              | `str`                  | `None`      | Gemini model identifier. *(Inherited from base settings.)*                                         |
| `system_instruction` | `str`                  | `None`      | System instruction/prompt for the model. *(Inherited from base settings.)*                         |
| `max_tokens`         | `int`                  | `NOT_GIVEN` | Maximum number of tokens to generate.                                                              |
| `temperature`        | `float`                | `NOT_GIVEN` | Sampling temperature (0.0 to 2.0). Lower values are more focused, higher values are more creative. |
| `top_k`              | `int`                  | `NOT_GIVEN` | Top-k sampling parameter. Limits tokens to the top k most likely.                                  |
| `top_p`              | `float`                | `NOT_GIVEN` | Top-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output.                               |
| `thinking`           | `GoogleThinkingConfig` | `NOT_GIVEN` | Thinking configuration. See [GoogleThinkingConfig](#googlethinkingconfig) below.                   |

<Note>
  `NOT_GIVEN` values are omitted from the API request, letting the Gemini API
  use its own defaults. If `thinking` is not provided, Pipecat applies
  low-latency thinking defaults for Flash models: Gemini 2.5 Flash uses
  `thinking_budget=0` (disables thinking), while Gemini 3+ Flash uses
  `thinking_level="minimal"`.
</Note>

### GoogleThinkingConfig

Configuration for controlling the model's internal thinking process. Gemini 2.5 and 3 series models support this feature.

| Parameter          | Type   | Default | Description                                                                                                              |
| ------------------ | ------ | ------- | ------------------------------------------------------------------------------------------------------------------------ |
| `thinking_budget`  | `int`  | `None`  | Token budget for thinking (Gemini 2.5 series). -1 for dynamic, 0 to disable, or a specific count (e.g., 128-32768).      |
| `thinking_level`   | `str`  | `None`  | Thinking level for Gemini 3 models. `"low"`, `"high"` for 3 Pro; `"minimal"`, `"low"`, `"medium"`, `"high"` for 3 Flash. |
| `include_thoughts` | `bool` | `None`  | Whether to include thought summaries in the response.                                                                    |

<Note>
  Gemini 2.5 series models use `thinking_budget`, while Gemini 3 models use
  `thinking_level`. Do not mix these parameters across model generations.
</Note>

## Usage

### Basic Setup

```python theme={null}
from pipecat.services.google import GoogleLLMService

llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    model="gemini-2.5-flash",
)
```

### With Custom Settings

```python theme={null}
from pipecat.services.google import GoogleLLMService

llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    settings=GoogleLLMService.Settings(
        model="gemini-2.5-pro",
        system_instruction="You are a helpful assistant.",
        temperature=0.7,
        max_tokens=2048,
        top_p=0.9,
    ),
)
```

### With Thinking Configuration

```python theme={null}
# Gemini 2.5 series (using thinking_budget)
llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    settings=GoogleLLMService.Settings(
        model="gemini-2.5-pro",
        max_tokens=8192,
        thinking=GoogleLLMService.GoogleThinkingConfig(
            thinking_budget=4096,
            include_thoughts=True,
        ),
    ),
)

# Gemini 3 series (using thinking_level)
llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    settings=GoogleLLMService.Settings(
        model="gemini-3-flash",
        max_tokens=8192,
        thinking=GoogleLLMService.GoogleThinkingConfig(
            thinking_level="high",
            include_thoughts=True,
        ),
    ),
)
```

### Updating Settings at Runtime

Model settings can be changed mid-conversation using `LLMUpdateSettingsFrame`:

```python theme={null}
from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.google.llm import GoogleLLMSettings

await task.queue_frame(
    LLMUpdateSettingsFrame(
        delta=GoogleLLMSettings(
            temperature=0.3,
            max_tokens=1024,
        )
    )
)
```

<Tip>
  The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
  `Settings` / `settings=` instead. See the [Service Settings
  guide](/pipecat/fundamentals/service-settings) for migration details.
</Tip>

## Notes

* **System instruction priority**: The `system_instruction` set via the constructor or `GoogleLLMSettings` takes priority over any system message in the context. If both are set, a warning is logged and the constructor/settings value is used.
* **Thinking defaults**: By default, Pipecat applies low-latency thinking defaults for Flash models to reduce latency. Gemini 2.5 Flash uses `thinking_budget=0` (disables thinking), while Gemini 3+ Flash uses `thinking_level="minimal"`. To override this behavior, explicitly pass a `GoogleThinkingConfig` via `settings`.
* **Multimodal support**: Gemini models natively support image and audio inputs through Google's Content/Part format. Images and audio are automatically converted from OpenAI-style contexts.
* **Grounding with Google Search**: When grounding metadata is present in the response (e.g., from Google Search tool), the service emits `LLMSearchResponseFrame` with search results and source attributions.
* **Context format**: The service automatically converts between OpenAI-style message formats and Google's native Content/Part format, so you can use either.

## Event Handlers

`GoogleLLMService` supports the following event handlers, inherited from [LLMService](/api-reference/server/events/service-events):

| Event                       | Description                                                                 |
| --------------------------- | --------------------------------------------------------------------------- |
| `on_completion_timeout`     | Called when an LLM completion request times out (Google `DeadlineExceeded`) |
| `on_function_calls_started` | Called when function calls are received and execution is about to start     |

```python theme={null}
@llm.event_handler("on_completion_timeout")
async def on_completion_timeout(service):
    print("LLM completion timed out")
```
