Google Vertex AI
LLM service implementation using Google’s Vertex AI with OpenAI-compatible interface
Overview
GoogleVertexLLMService
provides access to Google’s language models through Vertex AI while maintaining an OpenAI-compatible interface. It inherits from OpenAILLMService
and supports all the features of the OpenAI interface while connecting to Google’s AI services.
Installation
To use GoogleVertexLLMService
, install the required dependencies:
You’ll also need to set up Google Cloud credentials. You can either:
- Set the
GOOGLE_APPLICATION_CREDENTIALS
environment variable pointing to your service account JSON file - Provide credentials directly to the service constructor
Configuration
Constructor Parameters
JSON string of Google service account credentials
Path to the Google service account JSON file
Model identifier
Vertex AI specific parameters
Input Parameters
Extends the OpenAI input parameters with Vertex AI specific options:
Google Cloud region where the model is deployed
Google Cloud project ID
Also inherits all OpenAI-compatible parameters:
Reduces likelihood of repeating tokens based on their frequency. Range: [-2.0, 2.0]
Maximum number of tokens to generate. Must be greater than or equal to 1
Reduces likelihood of repeating any tokens that have appeared. Range: [-2.0, 2.0]
Controls randomness in the output. Range: [0.0, 2.0]
Controls diversity via nucleus sampling. Range: [0.0, 1.0]
Usage Example
Authentication
The service supports multiple authentication methods:
- Direct credentials string - Pass the JSON credentials as a string to the constructor
- Credentials file path - Provide a path to the service account JSON file
- Environment variable - Set
GOOGLE_APPLICATION_CREDENTIALS
to the path of your service account file
The service automatically handles token refresh, with tokens having a 1-hour lifetime.
Methods
See the LLM base class methods for additional functionality.
Function Calling
This service supports function calling (also known as tool calling) through the OpenAI-compatible interface, which allows the LLM to request information from external services and APIs.
Function Calling Guide
Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications.
Available Models
Model Name | Description |
---|---|
google/gemini-2.0-flash-001 | Fast, efficient text generation model |
google/gemini-2.0-pro-001 | Comprehensive, high-quality model |
google/gemini-1.5-pro-001 | Versatile multimodal model |
google/gemini-1.5-flash-001 | Fast, efficient multimodal model |
See Google Vertex AI documentation for a complete list of supported models and their capabilities.
Frame Flow
Inherits the OpenAI LLM Service frame flow:
Metrics Support
The service collects standard LLM metrics:
- Token usage (prompt and completion)
- Processing duration
- Time to First Byte (TTFB)
- Function call metrics
Notes
- Uses Google Cloud’s Vertex AI API
- Maintains OpenAI-compatible interface
- Supports streaming responses
- Handles function calling
- Manages conversation context
- Includes token usage tracking
- Thread-safe processing
- Automatic token refresh
- Requires Google Cloud project setup