Google Gemini

Overview

GoogleLLMService provides integration with Google’s Gemini models, supporting streaming responses, function calling, and multimodal inputs. It includes specialized context handling for Google’s message format while maintaining compatibility with OpenAI-style contexts.

Installation

To use GoogleLLMService, install the required dependencies:

pip install "pipecat-ai[google]"

You’ll also need to set up your Google API key as an environment variable: GOOGLE_API_KEY

Configuration

Constructor Parameters

api_key

str

required

Google API key

model

str

default:"gemini-2.0-flash-001"

Model identifier

system_instruction

Optional[str]

default:"None"

System instructions for the model.

Unlike OpenAI, Google Gemini handles system messages differently. System messages are:

Set during client initialization as system_instruction
Can be updated during a conversation, which will recreate the client
Not directly included in the message context like in OpenAI

tools

Optional[List[Dict[str, Any]]]

default:"None"

List of function definitions for the model to use

tool_config

Optional[Dict[str, Any]]

default:"None"

Configuration for tool usage

params

InputParams

Model configuration parameters

Input Parameters

extra

Optional[Dict[str, Any]]

default:"{}"

Additional parameters to pass to the model

max_tokens

Optional[int]

default:"4096"

Maximum number of tokens to generate. Must be greater than or equal to 1

temperature

Optional[float]

default:"None"

Controls randomness in the output. Range: [0.0, 2.0]

top_k

Optional[int]

default:"None"

Controls diversity via nucleus sampling. Must be greater than or equal to 0

top_p

Optional[float]

default:"None"

Controls diversity via nucleus sampling. Range: [0.0, 1.0]

Input Frames

OpenAILLMContextFrame

Frame

Contains conversation context

LLMMessagesFrame

Frame

Contains conversation messages

VisionImageRawFrame

Frame

Contains image for vision processing

LLMUpdateSettingsFrame

Frame

Updates model settings

Output Frames

TextFrame

Frame

Contains generated text

LLMFullResponseStartFrame

Frame

Signals start of response

LLMFullResponseEndFrame

Frame

Signals end of response

LLMSearchResponseFrame

Frame

Contains search results and origins when grounding is used

Context Management

The Google service uses specialized context management to handle conversations and message formatting. This includes managing the conversation history, system prompts, function calls, and converting between OpenAI and Google message formats.

GoogleLLMContext

The base context manager for Google conversations:

context = GoogleLLMContext(
    messages=[],    # Conversation history
    tools=[],       # Available function calling tools
    tool_choice={}  # How the model should use the provided tools
)

Context Aggregators

Context aggregators handle message format conversion and management. The service provides a method to create paired aggregators:

create_context_aggregator

static method

Creates user and assistant aggregators for handling message formatting.

@staticmethod
def create_context_aggregator(
    context: OpenAILLMContext,
    *,
    assistant_expect_stripped_words: bool = True
) -> GoogleContextAggregatorPair

Parameters

context

OpenAILLMContext

required

The context object containing conversation history and settings

assistant_expect_stripped_words

bool

default:"True"

Controls text preprocessing for assistant responses

Usage Example

# 1. Create the context
context = GoogleLLMContext(
    messages=[],
    system="You are a helpful assistant"
)

# 2. Create aggregators for message handling
aggregators = GoogleLLMService.create_context_aggregator(context)

# 3. Access individual aggregators
user_aggregator = aggregators.user()      # Handles user message formatting
assistant_aggregator = aggregators.assistant()  # Handles assistant responses

# 4. Use in a pipeline
pipeline = Pipeline([
    user_aggregator,
    llm,
    assistant_aggregator
])

The context management system ensures proper message formatting and history tracking throughout the conversation while handling the conversion between OpenAI and Google message formats automatically.

Methods

See the LLM base class methods for additional functionality.

Function Calling

This service supports function calling (also known as tool calling) which allows the LLM to request information from external services and APIs. For example, you can enable your bot to:

Check current weather conditions
Query databases
Access external APIs
Perform custom actions

Function Calling Guide

Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications.

Search Grounding

GoogleLLMService supports Google’s Search Grounding feature, which enables the model to retrieve and reference information from the web when generating responses. This feature enhances the model’s ability to provide up-to-date information and cite sources.

Enabling Search Grounding

To enable Search Grounding, configure the tools parameter with a Google search retrieval configuration:

# Configure search grounding
search_tool = {
    "google_search_retrieval": {
        "dynamic_retrieval_config": {
            "mode": "MODE_DYNAMIC",
            "dynamic_threshold": 0,  # 0 = always ground with search
        }
    }
}

# Initialize LLM with search tool
llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    model="gemini-1.5-flash-002",  # Use a model that supports grounding
    system_instruction="Your system instruction here",
    tools=[search_tool],
)

Dynamic Retrieval Configuration

The dynamic_retrieval_config controls when Search Grounding is applied:

mode

string

default:"MODE_UNSPECIFIED"

Retrieval mode: “MODE_UNSPECIFIED” (always on), “MODE_DYNAMIC” (model decides)

dynamic_threshold

float

default:"0.3"

Controls frequency of search retrieval when using MODE_DYNAMIC. Range: 0.0 (always retrieve) to 1.0 (never retrieve)

Receiving Search Results

When Search Grounding is used, the service generates a LLMSearchResponseFrame containing the search results and source information:

search_result

str

The full text response generated with search grounding

origins

List[Dict]

Information about source websites and referenced content:

site_uri: Source URL
site_title: Source title
results: Text segments from the source with confidence scores

rendered_content

Optional[str]

Additional content rendered by the grounding system

Pipeline Example

Here’s how to integrate Search Grounding in a pipeline:

# Configure the service
llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    model="gemini-1.5-flash-002",
    system_instruction=system_instruction,
    tools=[search_tool],
)


# Create pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant(),
])

Usage Examples

Basic Usage

# Configure service
llm = GoogleLLMService(
    api_key="your-api-key",
    model="gemini-1.5-flash-latest",
    params=GoogleLLMService.InputParams(
        temperature=0.7,
        max_tokens=1000
    )
)

# Create context and aggregators
context = GoogleLLMContext()
context_aggregator = llm.create_context_aggregator(context)

# Create pipeline
pipeline = Pipeline([
    transport.input(),
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

With Function Calling

# Configure function calling
context = GoogleLLMContext()
context.add_tool({
    "function_declarations": [{
        "name": "get_weather",
        "description": "Get weather information",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            }
        }
    }]
})

Frame Flow

Metrics Support

The service collects various metrics:

Token usage (prompt and completion)
Processing time
Time to first byte (TTFB)

Notes

Supports streaming responses
Handles function calling
Provides OpenAI compatibility
Manages conversation context
Supports vision inputs
Includes metrics collection
Thread-safe processing

API Reference

Services

Utilities

Frameworks

Pipeline

Base Service Classes

Overview

Installation

Configuration

Constructor Parameters

Input Parameters

Input Frames

Output Frames

Context Management

GoogleLLMContext

Context Aggregators

Parameters

Usage Example

Methods

Function Calling

Function Calling Guide

Search Grounding

Enabling Search Grounding

Dynamic Retrieval Configuration

Receiving Search Results

Pipeline Example

Usage Examples

Basic Usage

With Function Calling

Frame Flow

Metrics Support

Notes

API Reference

Services

Utilities

Frameworks

Pipeline

Base Service Classes

​Overview

​Installation

​Configuration

​Constructor Parameters

​Input Parameters

​Input Frames

​Output Frames

​Context Management

​GoogleLLMContext

​Context Aggregators

​Parameters

​Usage Example

​Methods

​Function Calling

Function Calling Guide

​Search Grounding

​Enabling Search Grounding

​Dynamic Retrieval Configuration

​Receiving Search Results

​Pipeline Example

​Usage Examples

​Basic Usage

​With Function Calling

​Frame Flow

​Metrics Support

​Notes

Overview

Installation

Configuration

Constructor Parameters

Input Parameters

Input Frames

Output Frames

Context Management

GoogleLLMContext

Context Aggregators

Parameters

Usage Example

Methods

Function Calling

Search Grounding

Enabling Search Grounding

Dynamic Retrieval Configuration

Receiving Search Results

Pipeline Example

Usage Examples

Basic Usage

With Function Calling

Frame Flow

Metrics Support

Notes