Deploy Pipecat applications to Modal
bot_vllm.py
file in the next section./docs
endpoint (https://<Modal workspace>--example-vllm-openai-compatible-serve.modal.run/docs
)
for your deployed LLM and wait for it to fully load before connecting your client.modal_url
in server/src/bot_vllm.py
to point to the URL you received from the self-serve LLM deployment in the previous step.
server
directory, test the app locally:
app.js
configuration mentioned in its README.
example-vllm-openai-compatible
: This App contains the containers and logs used to run your self-hosted LLM. There will be just one App Function listed: serve
. Click on this function to view logs for your LLM.pipecat-modal
: This App contains the containers and logs used to run your connect
endpoints and Pipecat pipelines. It will list two App Functions:
fastapi_app
: This function is running the endpoints that your client will interact with and initiate starting a new pipeline (/
, /connect
, /status
). Click on this function to see logs for each endpoint hit.bot_runner
: This function handles launching and running a bot pipeline. Click on this function to get a list of all pipeline runs and access each run’s logs.Popen
to launch the pipeline process from the /connect
endpoint. In this example, we use a Modal function instead. This allows us to run the pipelines using a separately defined Modal image as well as run each pipeline in an isolated container.debian_slim
CPU-only should be all that’s required to run. GPU containers are needed for self-hosted services.min_containers=1
on the Modal Function that launches the pipeline to ensure at least one warm instance of your function is always available.