Overview

The GeminiLiveWebsocketTransport class implements a fully functional Pipecat Transport, providing a framework for implementing real-time communication directly with the Gemini Multimodal Live service. Like all transports, it handles media device management, audio/video streams, and state management for the connection.
Transports of this type are designed primarily for development and testing purposes. For production applications, you will need to build a server component with a server-friendly transport, like the DailyTransport, to securely handle API keys.

Usage

Basic Setup

import { GeminiLiveWebsocketTransport, GeminiLLMServiceOptions } from '@pipecat-ai/gemini-live-websocket-transport';
import { PipecatClient } from '@pipecat-ai/client-js';

const options: GeminiLLMServiceOptions = {
  api_key: 'YOUR_API_KEY',
  initial_messages: [
    // Set up initial system and user messages.
    // Without the user message, the bot will not respond immediately
    // and wait for the user to speak first.
    {
      role: "model",
      content: "You are a confused jellyfish.",
    },
    { role: "user", content: "Blub blub!" },
  ],
  generation_config: {
    temperature: 0.7,
    maxOutput_tokens: 1000
  }
};

const transport = new GeminiLiveWebsocketTransport(options);
let pcClient = new PipecatClient({
  transport: new GeminiLiveWebsocketTransport (options),
  callbacks: {
    // Event handlers
  },
});
pcClient.connect();

API Reference

Constructor Options

GeminiLLMServiceOptions

interface GeminiLLMServiceOptions {
  api_key: string; // Required: Your Gemini API key
  initial_messages?: Array<{
    // Optional: Initial conversation context
    content: string;
    role: string;
  }>;
  generation_config?: {
    // Optional: Generation parameters
    candidate_count?: number;
    max_output_tokens?: number;
    temperature?: number;
    top_p?: number;
    top_k?: number;
    presence_penalty?: number;
    frequency_penalty?: number;
    response_modalities?: string;
    speech_config?: {
      voice_config?: {
        prebuilt_voice_config?: {
          voice_name: "Puck" | "Charon" | "Kore" | "Fenrir" | "Aoede";
        };
      };
    };
  };
}

TransportConnectionParams

The GeminiLiveWebsocketTransport does not take connection parameters. It connects directly to the Gemini Multimodal Live service using the API key provided as part of the initial configuration.

Events

The GeminiLiveWebSocketTransport implements the various PipecatClient event handlers. Check out the docs or samples for more info.

More Information

Package

@pipecat-ai/realtime-websocket-transport