> ## Documentation Index > Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt > Use this file to discover all available pages before exploring further. # Google Gemini > Large Language Model service implementation using Google's Gemini API ## Overview `GoogleLLMService` provides integration with Google's Gemini models, supporting streaming responses, function calling, and multimodal inputs. It includes specialized context handling for Google's message format while maintaining compatibility with OpenAI-style contexts. Pipecat's API methods for Google Gemini integration Complete example with function calling Official Google Gemini API documentation and features Access Gemini models and manage API keys ## Installation To use Google Gemini services, install the required dependencies: ```bash theme={null} uv add "pipecat-ai[google]" ``` ## Prerequisites ### Google Gemini Setup Before using Google Gemini LLM services, you need: 1. **Google Account**: Sign up at [Google AI Studio](https://aistudio.google.com/) 2. **API Key**: Generate a Gemini API key from AI Studio 3. **Model Selection**: Choose from available Gemini models (Gemini 2.5 Flash, Gemini 2.5 Pro, etc.) ### Required Environment Variables * `GOOGLE_API_KEY`: Your Google Gemini API key for authentication ## Configuration Google AI API key for authentication. Gemini model name to use (e.g., `"gemini-2.5-flash"`, `"gemini-2.5-pro"`). *Deprecated in v0.0.105. Use `settings=GoogleLLMService.Settings(...)` instead.* Runtime-configurable model settings. See [Settings](#settings) below. Runtime-configurable model settings. See [Settings](#settings) below. *Deprecated in v0.0.105. Use `settings=GoogleLLMService.Settings(...)` instead.* System instruction/prompt for the model. Sets the overall behavior and context. *Deprecated in v0.0.105. Use `settings=GoogleLLMService.Settings(system_instruction=...)` instead.* List of available tools/functions for the model to call. Configuration for tool usage behavior. HTTP options for the Google API client. ### Settings Runtime-configurable settings passed via the `settings` constructor argument using `GoogleLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details. | Parameter | Type | Default | Description | | -------------------- | ---------------------- | ----------- | -------------------------------------------------------------------------------------------------- | | `model` | `str` | `None` | Gemini model identifier. *(Inherited from base settings.)* | | `system_instruction` | `str` | `None` | System instruction/prompt for the model. *(Inherited from base settings.)* | | `max_tokens` | `int` | `NOT_GIVEN` | Maximum number of tokens to generate. | | `temperature` | `float` | `NOT_GIVEN` | Sampling temperature (0.0 to 2.0). Lower values are more focused, higher values are more creative. | | `top_k` | `int` | `NOT_GIVEN` | Top-k sampling parameter. Limits tokens to the top k most likely. | | `top_p` | `float` | `NOT_GIVEN` | Top-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output. | | `thinking` | `GoogleThinkingConfig` | `NOT_GIVEN` | Thinking configuration. See [GoogleThinkingConfig](#googlethinkingconfig) below. | `NOT_GIVEN` values are omitted from the API request, letting the Gemini API use its own defaults. If `thinking` is not provided, Pipecat applies low-latency thinking defaults for Flash models: Gemini 2.5 Flash uses `thinking_budget=0` (disables thinking), while Gemini 3+ Flash uses `thinking_level="minimal"`. ### GoogleThinkingConfig Configuration for controlling the model's internal thinking process. Gemini 2.5 and 3 series models support this feature. | Parameter | Type | Default | Description | | ------------------ | ------ | ------- | ------------------------------------------------------------------------------------------------------------------------ | | `thinking_budget` | `int` | `None` | Token budget for thinking (Gemini 2.5 series). -1 for dynamic, 0 to disable, or a specific count (e.g., 128-32768). | | `thinking_level` | `str` | `None` | Thinking level for Gemini 3 models. `"low"`, `"high"` for 3 Pro; `"minimal"`, `"low"`, `"medium"`, `"high"` for 3 Flash. | | `include_thoughts` | `bool` | `None` | Whether to include thought summaries in the response. | Gemini 2.5 series models use `thinking_budget`, while Gemini 3 models use `thinking_level`. Do not mix these parameters across model generations. ## Usage ### Basic Setup ```python theme={null} from pipecat.services.google import GoogleLLMService llm = GoogleLLMService( api_key=os.getenv("GOOGLE_API_KEY"), model="gemini-2.5-flash", ) ``` ### With Custom Settings ```python theme={null} from pipecat.services.google import GoogleLLMService llm = GoogleLLMService( api_key=os.getenv("GOOGLE_API_KEY"), settings=GoogleLLMService.Settings( model="gemini-2.5-pro", system_instruction="You are a helpful assistant.", temperature=0.7, max_tokens=2048, top_p=0.9, ), ) ``` ### With Thinking Configuration ```python theme={null} # Gemini 2.5 series (using thinking_budget) llm = GoogleLLMService( api_key=os.getenv("GOOGLE_API_KEY"), settings=GoogleLLMService.Settings( model="gemini-2.5-pro", max_tokens=8192, thinking=GoogleLLMService.GoogleThinkingConfig( thinking_budget=4096, include_thoughts=True, ), ), ) # Gemini 3 series (using thinking_level) llm = GoogleLLMService( api_key=os.getenv("GOOGLE_API_KEY"), settings=GoogleLLMService.Settings( model="gemini-3-flash", max_tokens=8192, thinking=GoogleLLMService.GoogleThinkingConfig( thinking_level="high", include_thoughts=True, ), ), ) ``` ### Updating Settings at Runtime Model settings can be changed mid-conversation using `LLMUpdateSettingsFrame`: ```python theme={null} from pipecat.frames.frames import LLMUpdateSettingsFrame from pipecat.services.google.llm import GoogleLLMSettings await worker.queue_frame( LLMUpdateSettingsFrame( delta=GoogleLLMSettings( temperature=0.3, max_tokens=1024, ) ) ) ``` The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use `Settings` / `settings=` instead. See the [Service Settings guide](/pipecat/fundamentals/service-settings) for migration details. ## Notes * **System instruction priority**: The `system_instruction` set via the constructor or `GoogleLLMSettings` takes priority over any system message in the context. If both are set, a warning is logged and the constructor/settings value is used. * **Thinking defaults**: By default, Pipecat applies low-latency thinking defaults for Flash models to reduce latency. Gemini 2.5 Flash uses `thinking_budget=0` (disables thinking), while Gemini 3+ Flash uses `thinking_level="minimal"`. To override this behavior, explicitly pass a `GoogleThinkingConfig` via `settings`. * **Multimodal support**: Gemini models natively support image and audio inputs through Google's Content/Part format. Images and audio are automatically converted from OpenAI-style contexts. * **Grounding with Google Search**: When grounding metadata is present in the response (e.g., from Google Search tool), the service emits `LLMSearchResponseFrame` with search results and source attributions. * **Context format**: The service automatically converts between OpenAI-style message formats and Google's native Content/Part format, so you can use either. ## Event Handlers `GoogleLLMService` supports the following event handlers, inherited from [LLMService](/api-reference/server/events/service-events): | Event | Description | | --------------------------- | --------------------------------------------------------------------------- | | `on_completion_timeout` | Called when an LLM completion request times out (Google `DeadlineExceeded`) | | `on_function_calls_started` | Called when function calls are received and execution is about to start | ```python theme={null} @llm.event_handler("on_completion_timeout") async def on_completion_timeout(service): print("LLM completion timed out") ```