> ## Documentation Index > Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt > Use this file to discover all available pages before exploring further. # Gemini Live > A real-time, multimodal conversational AI service powered by Google's Gemini ## Overview `GeminiLiveLLMService` enables natural, real-time conversations with Google's Gemini model. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences with multimodal capabilities including audio, video, and text processing. Want to start building? Check out our [Gemini Live Guide](/pipecat/features/gemini-live). Pipecat's API methods for Gemini Live integration Complete Gemini Live async tool calling example Official Google Gemini Live API documentation Gemini Live available models ## Installation To use Gemini Live services, install the required dependencies: ```bash theme={null} uv add "pipecat-ai[google]" ``` ## Prerequisites ### Google AI Setup Before using Gemini Live services, you need: 1. **Google Account**: Set up at [Google AI Studio](https://aistudio.google.com/) 2. **API Key**: Generate a Gemini API key from AI Studio 3. **Model Access**: Ensure access to Gemini Live models 4. **Multimodal Configuration**: Set up audio, video, and text modalities ### Required Environment Variables * `GOOGLE_API_KEY`: Your Google Gemini API key for authentication ### Key Features * **Multimodal Processing**: Handle audio, video, and text inputs simultaneously * **Real-time Streaming**: Low-latency audio and video processing * **Voice Activity Detection**: Automatic speech detection and turn management * **Function Calling**: Advanced tool integration and API calling capabilities * **Context Management**: Intelligent conversation history and system instruction handling ## Configuration ### GeminiLiveLLMService Google AI API key for authentication. Gemini model identifier to use. *Deprecated in v0.0.105. Use `settings=GeminiLiveLLMService.Settings(model=...)` instead.* TTS voice identifier for audio responses. *Deprecated in v0.0.105. Use `settings=GeminiLiveLLMService.Settings(voice=...)` instead.* System prompt for the model. Can also be provided via the LLM context. Tools available to the model: a `ToolsSchema`, a plain list of direct functions and/or `FunctionSchema` objects, or a list of provider-native tool dicts. Can also be provided via the LLM context. Runtime-configurable generation and session settings. See [Settings](#settings) below. *Deprecated in v0.0.105. Use `settings=GeminiLiveLLMService.Settings(...)` instead.* Runtime-configurable settings. See [Settings](#settings) below. Whether to start with audio input paused. Whether to start with video input paused. Whether to generate a response when context is first set. Set to `False` to wait for user input before the model responds. HTTP options for the Google API client. Use this to set API version (e.g. `HttpOptions(api_version="v1alpha")`) or other request options. Base URL for the Gemini File API. ### Settings Runtime-configurable settings passed via the `settings` constructor argument using `GeminiLiveLLMService.Settings(...)`. These can be updated mid-conversation with `LLMUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details. | Parameter | Type | Default | Description | | ---------------------------- | ---------------------------------------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------ | | `model` | `str` | `NOT_GIVEN` | Model identifier. *(Inherited from base settings.)* | | `system_instruction` | `str` | `NOT_GIVEN` | System instruction/prompt. *(Inherited from base settings.)* | | `temperature` | `float` | `NOT_GIVEN` | Sampling temperature (0.0-2.0). *(Inherited from base settings.)* | | `max_tokens` | `int` | `NOT_GIVEN` | Maximum tokens to generate. *(Inherited from base settings.)* | | `top_k` | `int` | `NOT_GIVEN` | Top-k sampling parameter. *(Inherited from base settings.)* | | `top_p` | `float` | `NOT_GIVEN` | Top-p (nucleus) sampling parameter (0.0-1.0). *(Inherited from base settings.)* | | `frequency_penalty` | `float` | `NOT_GIVEN` | Frequency penalty for generation (0.0-2.0). *(Inherited from base settings.)* | | `presence_penalty` | `float` | `NOT_GIVEN` | Presence penalty for generation (0.0-2.0). *(Inherited from base settings.)* | | `voice` | `str` | `NOT_GIVEN` | TTS voice identifier (e.g. `"Charon"`, `"Puck"`). | | `modalities` | `GeminiModalities` | `NOT_GIVEN` | Response modality: `GeminiModalities.AUDIO` or `GeminiModalities.TEXT`. *Note: TEXT modality may not be supported by recent models.* | | `language` | `Language \| str` | `NOT_GIVEN` | Language for generation and transcription. | | `media_resolution` | `GeminiMediaResolution` | `NOT_GIVEN` | Media resolution for video input: `UNSPECIFIED`, `LOW`, `MEDIUM`, or `HIGH`. | | `vad` | `GeminiVADParams` | `NOT_GIVEN` | Voice activity detection parameters. See [GeminiVADParams](#geminivadparams) below. | | `context_window_compression` | `ContextWindowCompressionParams \| dict` | `NOT_GIVEN` | Context window compression settings. | | `thinking` | `ThinkingConfig \| dict` | `NOT_GIVEN` | Thinking/reasoning configuration. Requires a model that supports it. | | `enable_affective_dialog` | `bool` | `NOT_GIVEN` | Enable affective dialog for expression and tone adaptation. | | `proactivity` | `ProactivityConfig \| dict` | `NOT_GIVEN` | Proactivity settings for model behavior. | `NOT_GIVEN` values are omitted, letting the service use its own defaults (e.g. `"models/gemini-2.5-flash-native-audio-preview-12-2025"` for model, `"Charon"` for voice, `4096` for max\_tokens). Only parameters that are explicitly set are included. ### GeminiVADParams Voice activity detection configuration passed via the `vad` Settings field: | Parameter | Type | Default | Description | | --------------------- | ------------------ | ------- | --------------------------------------------------------------------------------------------------------------- | | `disabled` | `bool` | `None` | Whether to disable server-side VAD. `None`/`False` enables server-side VAD (default), `True` enables local VAD. | | `start_sensitivity` | `StartSensitivity` | `None` | Sensitivity for speech start detection. | | `end_sensitivity` | `EndSensitivity` | `None` | Sensitivity for speech end detection. | | `prefix_padding_ms` | `int` | `None` | Padding before speech starts in milliseconds. | | `silence_duration_ms` | `int` | `None` | Silence duration threshold in milliseconds to detect speech end. | ### ContextWindowCompressionParams | Parameter | Type | Default | Description | | ---------------- | ------ | ------- | ------------------------------------------------------------------------------------ | | `enabled` | `bool` | `False` | Whether context window compression is enabled. | | `trigger_tokens` | `int` | `None` | Token count to trigger compression. `None` uses the default (80% of context window). | ## Usage ### Basic Setup ```python theme={null} import os from pipecat.services.google.gemini_live import GeminiLiveLLMService llm = GeminiLiveLLMService( api_key=os.getenv("GOOGLE_API_KEY"), settings=GeminiLiveLLMService.Settings( voice="Charon", system_instruction="You are a helpful assistant.", ), ) ``` ### With Settings ```python theme={null} from pipecat.services.google.gemini_live import ( GeminiLiveLLMService, GeminiVADParams, ContextWindowCompressionParams, ) llm = GeminiLiveLLMService( api_key=os.getenv("GOOGLE_API_KEY"), settings=GeminiLiveLLMService.Settings( model="models/gemini-2.5-flash-native-audio-preview-12-2025", system_instruction="You are a helpful assistant.", voice="Puck", temperature=0.7, max_tokens=2048, language="en-US", vad=GeminiVADParams( silence_duration_ms=500, ), context_window_compression={"enabled": True}, ), ) ``` ### With Local VAD ```python theme={null} from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.services.google.gemini_live import ( GeminiLiveLLMService, GeminiVADParams, ) from pipecat.processors.aggregators.llm_response_universal import ( LLMContextAggregatorPair, LLMUserAggregatorParams, ) llm = GeminiLiveLLMService( api_key=os.getenv("GOOGLE_API_KEY"), settings=GeminiLiveLLMService.Settings( voice="Charon", vad=GeminiVADParams(disabled=True), # Disable server-side VAD ), ) # Configure local VAD in your aggregator. realtime_service_mode=True keeps # context-writing correct with a realtime service; the local VAD here drives # turn-taking since server-side VAD is disabled. user_aggregator, assistant_aggregator = LLMContextAggregatorPair( context, realtime_service_mode=True, user_params=LLMUserAggregatorParams( vad_analyzer=SileroVADAnalyzer(), ), ) ``` Pass `realtime_service_mode=True` to `LLMContextAggregatorPair` for any realtime (speech-to-speech) service. See [Realtime (Speech-to-Speech) Services](/api-reference/server/utilities/turn-management/external-turn-management#realtime-speech-to-speech-services) for what it does and how it interacts with local VAD. ### Text-Only Mode TEXT modality may not be supported by recent Gemini Live models. The service will log a warning if you configure `modalities=GeminiModalities.TEXT`. ```python theme={null} from pipecat.services.google.gemini_live import ( GeminiLiveLLMService, GeminiModalities, ) llm = GeminiLiveLLMService( api_key=os.getenv("GOOGLE_API_KEY"), settings=GeminiLiveLLMService.Settings( system_instruction="You are a helpful assistant.", modalities=GeminiModalities.TEXT, ), ) ``` ### With Thinking Enabled ```python theme={null} from google.genai.types import ThinkingConfig llm = GeminiLiveLLMService( api_key=os.getenv("GOOGLE_API_KEY"), settings=GeminiLiveLLMService.Settings( model="models/gemini-2.5-flash-native-audio-preview-12-2025", system_instruction="You are a helpful assistant.", thinking=ThinkingConfig(include_thoughts=True), ), ) ``` The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use `Settings` / `settings=` instead. See the [Service Settings guide](/pipecat/fundamentals/service-settings) for migration details. ## Notes * **Model support**: The service supports both Gemini 2.5 and Gemini 3.x models. The service automatically detects and handles model-specific behavior. * **Async tool support**: Functions registered with `cancel_on_interruption=False` use Gemini's NON\_BLOCKING tool mechanism on models that support it (currently Gemini 2.x), allowing the conversation to continue while the tool runs in the background. The result is delivered via the async-tool mechanism and integrated into the model's next turn. On models that don't support NON\_BLOCKING (Gemini 3.x), the service logs a one-time warning explaining the limitation. Note: An intermittent 1008 error can occasionally occur on Gemini 2.5 during long-running tool calls; the service auto-reconnects when this happens. * **System instruction precedence**: The `system_instruction` from service settings takes precedence over an initial system message in the LLM context. A warning is logged when both are set. * **VAD modes**: By default, Gemini Live uses server-side VAD for detecting when the user starts and stops speaking. To use local VAD (e.g., Silero), set `vad=GeminiVADParams(disabled=True)` and configure an external VAD analyzer in your `LLMUserAggregatorParams`. The service will automatically send activity signals to the Gemini API when local VAD detects speech. * **Tools precedence**: Similarly, tools provided in the context override tools provided at init time. * **Transcription aggregation**: Gemini Live sends user transcriptions in small chunks. The service aggregates them into complete sentences using end-of-sentence detection with a 0.5-second timeout fallback. * **Session resumption**: The service automatically handles session resumption on reconnection using session resumption handles. In the rare case where reconnection occurs before a resumption handle is received, conversation history is preserved by reseeding it into the new session. * **Connection resilience**: The service will attempt up to 3 consecutive reconnections before treating a connection failure as fatal. * **Video frame rate**: Video frames are throttled to a maximum of one per second. * **Affective dialog and proactivity**: These features require both a supporting model and API version (`v1alpha`).