Create a real-time AI chatbot using Gemini Multimodal Live and Pipecat
transport.input()
and transport.output()
handle media streaming with
Dailycontext_aggregator
maintains conversation history for natural dialoguertvi_user_transcription
and rtvi_bot_transcription
handle speech-to-texttalking_animation
controls the bot’s visual state based on speaking
activityGeminiMultimodalLiveLLMService
is a speech-to-speech LLM service that interfaces with the Gemini Multimodal Live API.
It provides:
server.py
is a FastAPI server that creates the meeting room where clients and bots interact, manages bot instances, and handles client connections. It’s the orchestrator that brings everything on the server-side together.
OpenAILLMContext
is used as a common LLM base service for context management. In the future, we may add a specific context manager for Gemini.
RTVIProcessor
: Handles all client communication events including transcriptions,
speaking states, and performance metricsTalkingAnimation
: Controls the bot’s visual state, switching between
static and animated frames based on speaking statusRTVIObserver
monitors the entire pipeline and automatically collects relevant events to send to the client.DailyTransport
: Matches the WebRTC transport used in bot-gemini.py
connect
endpoint: Matches the /connect
route in server.py
PipecatClientProvider
is the root component for providing Pipecat client context to your application. By wrapping your PipecatClientAudio
and PipecatClientVideo
components in this provider, they can access the client instance and receive and process the streams received from the Pipecat server.
simple-chatbot
directory, start the server and client to test the chatbot:
http://localhost:5173
in your browser