Example: Modal
Deploy Pipecat applications to Modal
Modal is well-suited for Pipecat deployments because it handles container orchestration, scaling, and cold starts efficiently. This makes it a good choice for production Pipecat bots that need reliable performance.
This guide walks through the Modal example included in the Pipecat repository, which follows the same deployment pattern.
Modal example
View the complete Modal deployment example in our GitHub repository
Install the Modal CLI
Set up Modal
Follow Modal’s official instructions for creating an account and setting up the CLI
Deploy a self-serve LLM
-
Deploy Modal’s OpenAI-compatible LLM service:
Refer to Modal’s guide and example for Deploying an OpenAI-compatible LLM service with vLLM for more details.
- Take note of the endpoint URL from the previous step, which will look like:
You’ll need this for the
bot_vllm.py
file in the next section.
The default Modal LLM example uses Llama-3.1 and will shut down after 15
minutes of inactivity. Cold starts take 5-10 minutes. To prepare the service,
we recommend visiting the /docs
endpoint (https://<Modal workspace>--example-vllm-openai-compatible-serve.modal.run/docs
)
for your deployed LLM and wait for it to fully load before connecting your client.
Deploy FastAPI App and Pipecat pipeline to Modal
- Setup environment variables:
Alternatively, you can configure your Modal app to use secrets.
-
Update the
modal_url
inserver/src/bot_vllm.py
to point to the URL you received from the self-serve LLM deployment in the previous step. -
From within the
server
directory, test the app locally:
- Deploy to production:
- Note the endpoint URL produced from this deployment. It will look like:
You’ll need this URL for the client’s app.js
configuration mentioned in its README.
Launch your bots on Modal
Option 1: Direct Link
Simply click on the URL displayed after running the server or deploy step to launch an agent and be redirected to a Daily room to talk with the launched bot. This will use the OpenAI pipeline.
Option 2: Connect via an RTVI Client
Follow the instructions provided in the client folder’s README for building and running a custom client that connects to your Modal endpoint. The provided client includes a dropdown for choosing which bot pipeline to run.
Navigating your LLM, server, and Pipecat logs
On your Modal dashboard, you should have two Apps listed under Live Apps:
example-vllm-openai-compatible
: This App contains the containers and logs used to run your self-hosted LLM. There will be just one App Function listed:serve
. Click on this function to view logs for your LLM.pipecat-modal
: This App contains the containers and logs used to run yourconnect
endpoints and Pipecat pipelines. It will list two App Functions:fastapi_app
: This function is running the endpoints that your client will interact with and initiate starting a new pipeline (/
,/connect
,/status
). Click on this function to see logs for each endpoint hit.bot_runner
: This function handles launching and running a bot pipeline. Click on this function to get a list of all pipeline runs and access each run’s logs.
Modal & Pipecat Tips
- In most other Pipecat examples, we use
Popen
to launch the pipeline process from the/connect
endpoint. In this example, we use a Modal function instead. This allows us to run the pipelines using a separately defined Modal image as well as run each pipeline in an isolated container. - For the FastAPI and most common Pipecat Pipeline containers, a default
debian_slim
CPU-only should be all that’s required to run. GPU containers are needed for self-hosted services. - To minimize cold starts of the pipeline and reduce latency for users, set
min_containers=1
on the Modal Function that launches the pipeline to ensure at least one warm instance of your function is always available.
Next steps
Explore Modal's LLM Examples
For next steps on running a self-hosted LLM and reducing latency, check out all of Modal’s LLM examples