A long-lived dispatcher pre-creates the things that take time to set up — typically Daily rooms with tokens — and keeps them in an in-memory pool. When a session-start request arrives, the dispatcher pops a pre-warmed entry out of the pool, spawns a bot subprocess against it, and triggers a background task to replenish the pool back to its target size. The bot subprocess runs to completion in its own process; when it exits, the dispatcher cleans up the room. This is the shape used in theDocumentation Index
Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
Use this file to discover all available pages before exploring further.
instant-voice example. See server/src/server.py for a working RoomPool plus BotManager implementation with replenish-on-pop and process-exit cleanup.
When it fits
- You care about session-start latency more than anything else. Popping from a pre-warmed pool and forking a subprocess on a host that already has the bot image loaded is fast.
- Your traffic shape is steady enough that maintaining warm capacity isn’t wasteful.
- Your bot fits comfortably in a subprocess on a single host, and you’re willing to scale across hosts by running multiple instances of the dispatcher.
When it doesn’t
- Your concurrency outgrows a single host. Subprocess fan-out is bounded by the host’s CPU / memory / file descriptors /
pipe(2)limits. Run this pattern across N hosts and you’ll need a layer in front to route to the right one — at which point you may want a different shape. - You need strong per-session isolation (e.g. you’re hosting bots for multiple customers and a crash in one must absolutely not affect another). Subprocesses share a kernel and a host; that’s usually fine but isn’t VM-level isolation.
- Bots have large per-process memory footprints. The pool model is most efficient when one host can hold many bots.
How it usually looks
The dispatcher (a single long-running process per host) has two pieces:- A resource pool. At startup it pre-creates N transport resources (e.g. Daily rooms with tokens). When
/startarrives, it pops one. After popping, it kicks off a background task to add one more, keeping the pool size stable. - A worker manager. It spawns the bot as a subprocess with the room/token as arguments, tracks the PID, and cleans up associated resources (delete the room, free any session state) when the process exits.
async def bot(runner_args), transport configured from runner_args. It doesn’t know whether the room it joined was pre-allocated or freshly created.
Tradeoffs worth being explicit about
- Latency vs. idle cost. A warm pool of N pre-created rooms means you pay for those rooms even when nobody’s using them. Set the pool size to match your typical idle-to-bursty ratio.
- Single-host concurrency ceiling. The pattern works beautifully up to ~the host’s capacity, then degrades sharply. Know what that ceiling is for your pipeline.
- Replenishment timing. If you only refill after a pop, a sudden burst empties the pool faster than it can refill. A second background task that monitors pool size against a target can help.
- Process lifecycle. Subprocesses crash; bots disconnect; transport sessions time out. The dispatcher needs to handle all three — typically by
await proc.wait()in a background task that cleans up the associated transport resources on exit. - Replicate the pattern, scale horizontally. Running this dispatcher on multiple hosts is straightforward if your dispatch path doesn’t need stickiness — put a load balancer in front and let the pools be per-host.
See also
instant-voiceexample — a working implementation with bothRoomPoolandBotManager.- VM per session — the opposite tradeoff (no warm capacity, full per-session isolation).