A Pipecat bot is a Python process. Running one locally is straightforward; running the right number of them on demand for real users is a separate problem, and the section that follows is a map for both.Documentation Index
Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
Use this file to discover all available pages before exploring further.
Two questions, not one
It’s helpful to separate two things that often get tangled together:- Running a bot during development — getting a bot file to start up, accept a connection, and let you talk to it. Pipecat ships its own server for this; you usually don’t need to write one. See Running bots locally.
- Hosting bots in production — deciding what serves session-start requests at scale, how bot processes get spawned, and how the operational surface (capacity, lifecycle, observability) is handled. This is genuinely several distinct problems and the right answer depends on your bot, your traffic, and what you already know how to operate. See Self-hosting.
What you’ll typically build
Most Pipecat deployments end up with some version of these pieces:- A bot — your
bot.py. A Python function (async def bot(runner_args)) that joins a media session, runs your pipeline, and exits when the session ends. - Something that serves session-start requests — an HTTP service that receives “start a session” requests from your client, sets up the media transport, and causes a bot process to come into existence ready to accept the user. In development this is the built-in runner. In production this might be the runner directly, a hand-rolled dispatcher, a managed agent runtime, or Pipecat Cloud — see Self-hosting for the tradeoffs.
- A media transport — WebRTC, WebSockets, SIP, or whatever the user connects through. Sometimes a service you host, more often a third-party provider (Daily, Twilio, etc.).
POST /start, /sessions/{id}/...), so a bot file that runs locally can run against either with no code changes — and an HTTP service that fronts your fleet in production can offer the same shape if you want clients to be portable.
Where to go from here
If you’re starting fresh:Running bots locally
The bot entry point, the built-in runner, and the supported transports.
Self-hosting in production
What plays the runner’s role at scale, and how to think about the choices.
VM per session
Dispatcher calls a cloud machines API to spawn a fresh VM per session.
Warm pool with subprocess workers
Pre-allocated resources and a worker pool on a single host, replenished on
use.
Managed agent runtime
Hand the bot lifecycle off to a runtime that owns scaling and dispatch.
Telephony in production
Webhook-driven dispatch, SIP gotchas, and where telephony differs.
Fly.io
A worked vm-per-session deployment.
Modal
Containerized FastAPI + Pipecat with optional GPU functions.
Cerebrium
Serverless CPU/GPU infra for Pipecat agents.