A real-time, multimodal conversational AI service powered by Google’s Gemini
The GeminiMultimodalLiveLLMService enables natural, real-time conversations with Google’s Gemini model. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences. It provides:
Real-time Interaction
Stream audio and video in real-time with low latency response times
Speech Processing
Built-in speech-to-text and text-to-speech capabilities with multiple voice
options
Voice Activity Detection
Automatic detection of speech start/stop for natural conversations
Context Management
Intelligent handling of conversation history and system instructions
Parameters for managing the context window: - enabled: Enable/disable
compression (default: False) - trigger_tokens: Number of tokens that trigger
compression (default: None, uses 80% of context window)
This service supports function calling (also known as tool calling) which allows the LLM to request information from external services and APIs. For example, you can enable your bot to:
Gemini Multimodal Live supports the following languages:
Language Code
Description
Gemini Code
Language.AR
Arabic
ar-XA
Language.BN_IN
Bengali (India)
bn-IN
Language.CMN_CN
Chinese (Mandarin)
cmn-CN
Language.DE_DE
German (Germany)
de-DE
Language.EN_US
English (US)
en-US
Language.EN_AU
English (Australia)
en-AU
Language.EN_GB
English (UK)
en-GB
Language.EN_IN
English (India)
en-IN
Language.ES_ES
Spanish (Spain)
es-ES
Language.ES_US
Spanish (US)
es-US
Language.FR_FR
French (France)
fr-FR
Language.FR_CA
French (Canada)
fr-CA
Language.GU_IN
Gujarati (India)
gu-IN
Language.HI_IN
Hindi (India)
hi-IN
Language.ID_ID
Indonesian
id-ID
Language.IT_IT
Italian (Italy)
it-IT
Language.JA_JP
Japanese (Japan)
ja-JP
Language.KN_IN
Kannada (India)
kn-IN
Language.KO_KR
Korean (Korea)
ko-KR
Language.ML_IN
Malayalam (India)
ml-IN
Language.MR_IN
Marathi (India)
mr-IN
Language.NL_NL
Dutch (Netherlands)
nl-NL
Language.PL_PL
Polish (Poland)
pl-PL
Language.PT_BR
Portuguese (Brazil)
pt-BR
Language.RU_RU
Russian (Russia)
ru-RU
Language.TA_IN
Tamil (India)
ta-IN
Language.TE_IN
Telugu (India)
te-IN
Language.TH_TH
Thai (Thailand)
th-TH
Language.TR_TR
Turkish (Turkey)
tr-TR
Language.VI_VN
Vietnamese (Vietnam)
vi-VN
You can set the language using the language parameter:
Copy
Ask AI
from pipecat.transcriptions.language import Languagefrom pipecat.services.gemini_multimodal_live.gemini import ( GeminiMultimodalLiveLLMService, InputParams)# Set language during initializationllm = GeminiMultimodalLiveLLMService( api_key=os.getenv("GOOGLE_API_KEY"), params=InputParams(language=Language.ES_ES) # Spanish (Spain))