Google Vertex AI

Overview

GoogleVertexLLMService provides access to Google’s language models through Vertex AI while maintaining an OpenAI-compatible interface. It inherits from OpenAILLMService and supports all the features of the OpenAI interface while connecting to Google’s AI services.

Installation

To use GoogleVertexLLMService, install the required dependencies:

pip install "pipecat-ai[google]"

You’ll also need to set up Google Cloud credentials. You can either:

Set the GOOGLE_APPLICATION_CREDENTIALS environment variable pointing to your service account JSON file
Provide credentials directly to the service constructor

Configuration

Constructor Parameters

credentials

Optional[str]

JSON string of Google service account credentials

credentials_path

Optional[str]

Path to the Google service account JSON file

model

str

default:"google/gemini-2.0-flash-001"

Model identifier

params

InputParams

Vertex AI specific parameters

Input Parameters

Extends the OpenAI input parameters with Vertex AI specific options:

location

str

default:"us-east4"

Google Cloud region where the model is deployed

project_id

str

required

Google Cloud project ID

Also inherits all OpenAI-compatible parameters:

frequency_penalty

Optional[float]

Reduces likelihood of repeating tokens based on their frequency. Range: [-2.0, 2.0]

max_tokens

Optional[int]

Maximum number of tokens to generate. Must be greater than or equal to 1

presence_penalty

Optional[float]

Reduces likelihood of repeating any tokens that have appeared. Range: [-2.0, 2.0]

temperature

Optional[float]

Controls randomness in the output. Range: [0.0, 2.0]

top_p

Optional[float]

Controls diversity via nucleus sampling. Range: [0.0, 1.0]

Usage Example

from pipecat.services.google.llm_vertex import GoogleVertexLLMService
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineParams, PipelineTask

# Configure service
llm = GoogleVertexLLMService(
    credentials_path="/path/to/service-account.json",
    model="google/gemini-2.0-flash-001",
    params=GoogleVertexLLMService.InputParams(
        project_id="your-google-cloud-project-id",
        location="us-east4"
    )
)

# Create context with system message
context = OpenAILLMContext(
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant in a voice conversation. Keep responses concise."
        }
    ]
)

# Create context aggregator for message handling
context_aggregator = llm.create_context_aggregator(context)

# Set up pipeline
pipeline = Pipeline([
    transport.input(),
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

# Create and configure task
task = PipelineTask(
    pipeline,
    params=PipelineParams(
        allow_interruptions=True,
        enable_metrics=True,
        enable_usage_metrics=True,
    ),
)

Authentication

The service supports multiple authentication methods:

Direct credentials string - Pass the JSON credentials as a string to the constructor
Credentials file path - Provide a path to the service account JSON file
Environment variable - Set GOOGLE_APPLICATION_CREDENTIALS to the path of your service account file

The service automatically handles token refresh, with tokens having a 1-hour lifetime.

Methods

See the LLM base class methods for additional functionality.

Function Calling

This service supports function calling (also known as tool calling) through the OpenAI-compatible interface, which allows the LLM to request information from external services and APIs.

Function Calling Guide

Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications.

Available Models

Model Name	Description
`google/gemini-2.0-flash-001`	Fast, efficient text generation model
`google/gemini-2.0-pro-001`	Comprehensive, high-quality model
`google/gemini-1.5-pro-001`	Versatile multimodal model
`google/gemini-1.5-flash-001`	Fast, efficient multimodal model

See Google Vertex AI documentation for a complete list of supported models and their capabilities.

Frame Flow

Inherits the OpenAI LLM Service frame flow:

Metrics Support

The service collects standard LLM metrics:

Time to First Byte (TTFB) - Response latency measurement
Processing Duration - Total request processing time
Token Usage - Prompt tokens, completion tokens, and totals

Learn how to enable Metrics in your Pipeline.

Notes

Uses Google Cloud’s Vertex AI API
Maintains OpenAI-compatible interface
Supports streaming responses
Handles function calling
Manages conversation context
Includes token usage tracking
Thread-safe processing
Automatic token refresh
Requires Google Cloud project setup

API Reference

Services

Utilities

Frameworks

Pipeline

Google Vertex AI

Overview

Installation

Configuration

Constructor Parameters

Input Parameters

Usage Example

Authentication

Methods

Function Calling

Function Calling Guide

Available Models

Frame Flow

Metrics Support

Notes

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

​Installation

​Configuration

​Constructor Parameters

​Input Parameters

​Usage Example

​Authentication

​Methods

​Function Calling

Function Calling Guide

​Available Models

​Frame Flow

​Metrics Support

​Notes

Overview

Installation

Configuration

Constructor Parameters

Input Parameters

Usage Example

Authentication

Methods

Function Calling

Available Models

Frame Flow

Metrics Support

Notes