You write the bot file and a thin proxy server. The proxy receives session-start requests from clients and forwards them to an external agent runtime; the runtime owns the bot’s execution environment, scaling, lifecycle, and (typically) the WebRTC offer/answer plumbing. You don’t run a pool, you don’t spawn subprocesses, you don’t manage VM-level resources — the runtime does. This is the shape used in theDocumentation Index
Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
Use this file to discover all available pages before exploring further.
aws-agentcore-webrtc example: the proxy is a small FastAPI app that calls bedrock-agentcore to invoke the agent, and the agent is a containerized bot file deployed to AgentCore Runtime.
Pipecat Cloud is architecturally a managed runtime in this family. The difference from running PCC and rolling your own is mostly about how much of the runtime you want to operate vs. consume.
When it fits
- You’d rather not operate dispatch, fleet, or pool infrastructure at all.
- You’re already heavily invested in a cloud platform that has an agent-runtime offering (AWS Bedrock AgentCore, GCP Vertex AI agents, etc.).
- Your bots fit comfortably within the runtime’s constraints (container size, network egress, GPU availability if you need one).
- You’re willing to trade some flexibility for a lower-ops footprint.
When it doesn’t
- The runtime’s feature ceiling or pricing model doesn’t match your bot. Agent runtimes tend to be opinionated about how the bot communicates with the outside world; if your bot needs unusual networking, custom transport, or specific compute shapes, you can hit walls.
- You need portability across cloud providers and don’t want a runtime-shaped lock-in.
- Cold-start behavior of the runtime is unacceptable for your latency requirements. Most runtimes have warm-instance options but they cost extra.
How it usually looks
There are two pieces:- A proxy server that exposes whatever session-start API your clients expect (commonly
POST /startand a per-session proxy under/sessions/{id}/...). It authenticates the request and forwards it to the runtime’s invocation API. - The bot file, deployed to the runtime as a container. It receives session inputs (e.g. a WebRTC offer) through whatever channel the runtime supplies and runs the pipeline.
bedrock.invoke_agent_runtime(), and the agent container responds with a WebRTC answer over a streaming response. The bot file inside the container is a standard Pipecat bot.
Tradeoffs worth being explicit about
- Low ops vs. lock-in. You’ll have less infrastructure to operate but you’re betting on the runtime’s roadmap and pricing.
- Feature ceiling. Anything the runtime doesn’t natively support — custom transports, unusual networking, specific GPU types, persistent volumes — is friction. Worth confirming the runtime supports your bot’s actual shape, not just the example shape.
- Cold starts. Runtimes typically scale from zero and pay cold-start cost on first invocation. Many offer “min-instance” settings that keep capacity warm at a flat cost.
- Observability. Logs and metrics live in the runtime’s surface, not yours. If you have an existing logging stack, plan for getting the bot’s output into it (or accept living in the runtime’s UI).
See also
aws-agentcore-webrtcexample — worked example of the pattern on Bedrock AgentCore with WebRTC.aws-agentcore-websocketandaws-agentcore-webrtc-kvs— same pattern, other AgentCore transport modes.- Pipecat Cloud — Daily’s managed runtime, purpose-built for Pipecat.