Google Gemini
Large Language Model service implementation using Google’s Gemini API
Overview
GoogleLLMService
provides integration with Google’s Gemini models, supporting streaming responses, function calling, and multimodal inputs. It includes specialized context handling for Google’s message format while maintaining compatibility with OpenAI-style contexts.
Installation
To use GoogleLLMService
, install the required dependencies:
You’ll also need to set up your Google API key as an environment variable: GOOGLE_API_KEY
Configuration
Constructor Parameters
Google API key
Model identifier
System instructions for the model.
Unlike OpenAI, Google Gemini handles system messages differently. System messages are:
- Set during client initialization as
system_instruction
- Can be updated during a conversation, which will recreate the client
- Not directly included in the message context like in OpenAI
List of function definitions for the model to use
Configuration for tool usage
Model configuration parameters
Input Parameters
Additional parameters to pass to the model
Maximum number of tokens to generate. Must be greater than or equal to 1
Controls randomness in the output. Range: [0.0, 2.0]
Controls diversity via nucleus sampling. Must be greater than or equal to 0
Controls diversity via nucleus sampling. Range: [0.0, 1.0]
Input Frames
Contains conversation context
Contains conversation messages
Contains image for vision processing
Updates model settings
Output Frames
Contains generated text
Signals start of response
Signals end of response
Contains search results and origins when grounding is used
Context Management
The Google service uses specialized context management to handle conversations and message formatting. This includes managing the conversation history, system prompts, function calls, and converting between OpenAI and Google message formats.
GoogleLLMContext
The base context manager for Google conversations:
Context Aggregators
Context aggregators handle message format conversion and management. The service provides a method to create paired aggregators:
Usage Example
The context management system ensures proper message formatting and history tracking throughout the conversation while handling the conversion between OpenAI and Google message formats automatically.
Methods
See the LLM base class methods for additional functionality.
Function Calling
This service supports function calling (also known as tool calling) which allows the LLM to request information from external services and APIs. For example, you can enable your bot to:
- Check current weather conditions
- Query databases
- Access external APIs
- Perform custom actions
See the Function Calling guide for:
- Detailed implementation instructions
- Provider-specific function definitions
- Handler registration examples
- Control over function call behavior
- Complete usage examples
Search Grounding
GoogleLLMService
supports Google’s Search Grounding feature, which enables the model to retrieve and reference information from the web when generating responses. This feature enhances the model’s ability to provide up-to-date information and cite sources.
Enabling Search Grounding
To enable Search Grounding, configure the tools
parameter with a Google search retrieval configuration:
Dynamic Retrieval Configuration
The dynamic_retrieval_config
controls when Search Grounding is applied:
Retrieval mode: “MODE_UNSPECIFIED” (always on), “MODE_DYNAMIC” (model decides)
Controls frequency of search retrieval when using MODE_DYNAMIC. Range: 0.0 (always retrieve) to 1.0 (never retrieve)
Receiving Search Results
When Search Grounding is used, the service generates a LLMSearchResponseFrame
containing the search results and source information:
The full text response generated with search grounding
Information about source websites and referenced content:
site_uri
: Source URLsite_title
: Source titleresults
: Text segments from the source with confidence scores
Additional content rendered by the grounding system
Pipeline Example
Here’s how to integrate Search Grounding in a pipeline:
Usage Examples
Basic Usage
With Function Calling
Frame Flow
Metrics Support
The service collects various metrics:
- Token usage (prompt and completion)
- Processing time
- Time to first byte (TTFB)
Notes
- Supports streaming responses
- Handles function calling
- Provides OpenAI compatibility
- Manages conversation context
- Supports vision inputs
- Includes metrics collection
- Thread-safe processing