This quickstart guide will help you build and deploy your first Pipecat voice AI bot. You’ll create a simple conversational agent that you can talk to in real-time, then deploy it to production on Pipecat Cloud.Two steps: Local Development (5 min) → Production Deployment (5 min)
Pipecat supports many different AI services. You can swap out Deepgram for Azure Speech, OpenAI for Anthropic, or Cartesia for ElevenLabs without changing the rest of your code.See the supported services documentation for all available options.
Your bot maintains conversation history using a context object, enabling multi-turn interactions where the bot remembers what was said earlier.The context is initialized with a system message that defines the bot’s personality:
Copy
Ask AI
# All messages use the OpenAI message formatmessages = [ { "role": "system", "content": "You are a friendly AI assistant. Respond naturally and keep your answers conversational.", },]context = OpenAILLMContext(messages)context_aggregator = llm.create_context_aggregator(context)
The context aggregator automatically collects user messages (after speech-to-text) and assistant responses (after text-to-speech), maintaining the conversation flow without manual intervention.
When building web or mobile clients, you can use Pipecat’s client SDKs that communicate with your bot via the RTVI (Real-Time Voice Interaction) protocol. In our quickstart example, we initialize the RTVI processor to handle client-server messaging and events:
The core of your bot is a Pipeline that processes data through a series of processors:
Copy
Ask AI
# Create the pipeline with the processorspipeline = Pipeline([ transport.input(), # Receive audio from browser rtvi, # Protocol for client/server messaging and events stt, # Speech-to-text (Deepgram) context_aggregator.user(), # Add user message to context llm, # Language model (OpenAI) tts, # Text-to-speech (Cartesia) transport.output(), # Send audio back to browser context_aggregator.assistant(), # Add bot response to context])
Data flows through the pipeline as “frames”, objects containing audio, text, or other data types. The ordering is crucial: audio must be transcribed before it can be processed by the LLM, and text must be synthesized before it can be played back.The pipeline is managed by a PipelineTask:
Copy
Ask AI
# Create a PipelineTask to manage the pipeline executiontask = PipelineTask( pipeline, params=PipelineParams( enable_metrics=True, enable_usage_metrics=True, ), observers=[RTVIObserver(rtvi)],)
The task handles pipeline execution, collects metrics, and manages RTVI events through observers.
Event handlers manage the bot’s lifecycle and user interactions:
Copy
Ask AI
# Event handler for when a client connects@transport.event_handler("on_client_connected")async def on_client_connected(transport, client): logger.info(f"Client connected") # Add a greeting message to the context messages.append({"role": "system", "content": "Say hello and briefly introduce yourself."}) # Prompt the bot to start talking when the client connects await task.queue_frames([LLMRunFrame()])# Event handler for when a client disconnects@transport.event_handler("on_client_disconnected")async def on_client_disconnected(transport, client): logger.info(f"Client disconnected") # Cancel the task when the client disconnects # This stops the pipeline and all processors, cleaning up resources await task.cancel()
When a client connects, the bot adds a greeting instruction and queues a context frame to initiate the conversation. When disconnecting, it properly cancels the task to clean up resources.
Finally, the pipeline is executed by a PipelineRunner:
Copy
Ask AI
# Create a PipelineRunner to run the taskrunner = PipelineRunner(handle_sigint=False)# Finally, run the task using the runner# This will start the pipeline and begin processing framesawait runner.run(task)
The runner manages the pipeline’s execution lifecycle. Note that handle_sigint=False because the main runner handles system signals.
async def bot(runner_args: RunnerArguments): """Main bot entry point.""" # Configure transport parameters for different environments transport_params = { "daily": lambda: DailyParams( audio_in_enabled=True, audio_out_enabled=True, vad_analyzer=SileroVADAnalyzer(), ), "webrtc": lambda: TransportParams( audio_in_enabled=True, audio_out_enabled=True, vad_analyzer=SileroVADAnalyzer(), ), } transport = await create_transport(runner_args, transport_params) await run_bot(transport, runner_args)if __name__ == "__main__": from pipecat.runner.run import main main()
This runner automatically handles WebRTC connection setup and management, making it easy to get started with minimal configuration. The same code works for both local development and production deployment.
Production ready: This bot pattern is fully compatible with Pipecat Cloud,
meaning you can deploy your bot without any code changes.
Browser permissions: Make sure to allow microphone access when prompted by your browser.
Connection issues: If the WebRTC connection fails, first try a different browser. If that fails, make sure you don’t have a VPN or firewall rules blocking traffic. WebRTC uses UDP to communicate.
Audio issues: Check that your microphone and speakers are working and not muted.
The quickstart gave you a working example, but the Pipecat CLI helps you scaffold production-ready projects with your choice of platform (phone vs web/mobile), transport providers, and AI services—all tailored to your specific use case.