Overview - Pipecat

A Pipecat bot is a Python process. Running one locally is straightforward; running the right number of them on demand for real users is a separate problem, and the section that follows is a map for both.

Two questions, not one

It’s helpful to separate two things that often get tangled together:

Running a bot during development — getting a bot file to start up, accept a connection, and let you talk to it. Pipecat ships its own server for this; you usually don’t need to write one. See Running bots locally.
Hosting bots in production — deciding what serves session-start requests at scale, how bot processes get spawned, and how the operational surface (capacity, lifecycle, observability) is handled. This is genuinely several distinct problems and the right answer depends on your bot, your traffic, and what you already know how to operate. See Self-hosting.

What you’ll typically build

Most Pipecat deployments end up with some version of these pieces:

A bot — your bot.py. A Python function (async def bot(runner_args)) that joins a media session, runs your pipeline, and exits when the session ends.
Something that serves session-start requests — an HTTP service that receives “start a session” requests from your client, sets up the media transport, and causes a bot process to come into existence ready to accept the user. In development this is the built-in runner. In production this might be the runner directly, a hand-rolled dispatcher, a managed agent runtime, or Pipecat Cloud — see Self-hosting for the tradeoffs.
A media transport — WebRTC, WebSockets, SIP, or whatever the user connects through. Sometimes a service you host, more often a third-party provider (Daily, Twilio, etc.).

The development runner deliberately mimics Pipecat Cloud’s session-start API (POST /start, /sessions/{id}/...), so a bot file that runs locally can run against either with no code changes — and an HTTP service that fronts your fleet in production can offer the same shape if you want clients to be portable.

Where to go from here

If you’re starting fresh:

Running bots locally

The bot entry point, the built-in runner, and the supported transports.

Self-hosting in production

What plays the runner’s role at scale, and how to think about the choices.

If you already know the basics and want concrete patterns:

VM per session

Dispatcher calls a cloud machines API to spawn a fresh VM per session.

Warm pool with subprocess workers

Pre-allocated resources and a worker pool on a single host, replenished on use.

Managed agent runtime

Hand the bot lifecycle off to a runtime that owns scaling and dispatch.

Telephony in production

Webhook-driven dispatch, SIP gotchas, and where telephony differs.

If you want a worked example on a specific platform:

Fly.io

A worked vm-per-session deployment.

Modal

Containerized FastAPI + Pipecat with optional GPU functions.

Cerebrium

Serverless CPU/GPU infra for Pipecat agents.

A note on Pipecat Cloud

Pipecat Cloud is Daily’s managed service for running Pipecat bots. It exists because the operational surface in Self-hosting is genuinely large, and many teams would rather not build it themselves — but it’s one option among several. The pages in this section are intended to be useful whether you build the infrastructure yourself, use Pipecat Cloud, or end up with a hybrid. Some patterns described here come directly from how Pipecat Cloud is built. Where that’s true the page calls it out, usually with the caveat that PCC made a specific choice for reasons (multi-tenant isolation, billing, SLA) that may not apply to a single-tenant self-hosted deployment.

​Two questions, not one

​What you’ll typically build

​Where to go from here