Example: Cerebrium
Deploy Pipecat applications to Cerebrium
Cerebrium is a serverless Infrastructure platform that makes it easy for companies to build, deploy and scale AI applications. Cerebrium offers both CPUs and GPUs (H100s, A100s etc) with extremely low cold start times allowing us to create highly performant applications in the most cost efficient manner.
Install the Cerebrium CLI
To get started, let us run the following commands:
- Run
pip install cerebrium
to install the Python package. - Run
cerebrium login
to authenticate yourself.
If you don’t have a Cerebrium account, you can create one and get started with $30 in free credits.
Create a Cerebrium project
- Create a new Cerebrium project:
- This will create two key files:
main.py
- Your application entrypointcerebrium.toml
- Configuration for build and environment settings
Update your cerebrium.toml
with the necessary configuration:
In order for our application to work, we need to copy our API keys from the various platforms. Navigate to the Secrets section in your Cerebrium dashboard to store your API keys:
OPENAI_API_KEY
- We use OpenAI For the LLM. You can get your API key from hereDAILY_TOKEN
- For WebRTC communication. You can get your token from hereCARTERSIA_API_KEY
- For text-to-speech services. You can get your API key from here
We access these secrets in our code as if they are normal ENV vars. For the above, You can swap in any LLM or TTS service you wish to use.
Agent setup
We create a basic pipeline setup in our main.py
that combines our LLM, TTS and Daily WebRTC transport layer.
First, in our main function, we initialize the daily transport layer to receive/send the audio/video data from the Daily room we will connect to. You can see we pass the room_url we would like to join as well as a token to authenticate us programmatically joining. We also set our VAD stop seconds which is the amount of time we wait for a pause before our bot will respond - in this example, we set it to 600 milliseconds.
Next we connect to our LLM (OpenAI) as well as our TTS model (Cartesia). By setting ‘transcription_enabled=true’ we are using the STT from Daily itself. This is where the Pipecat framework helps convert audio data to text and vice versa.
We then put this all together as a PipelineTask which is what Pipecat runs all together. The make up of a task is completely customizable and has support for Image and Vision use cases. Lastly, we have some event handlers for when a user joins/leaves the room.
Deploy bot
Deploy your application to Cerebrium:
You will then see that an endpoint is created for your bot at POST \<BASE_URL\>/main
that you can call with your room_url and token. Let us test it.
Test it out
Future Considerations
Since Cerebrium supports both CPU and GPU workloads if you would like to lower the latency of your application then the best would be to get model weights from various providers and run them locally. You can do this for:
- LLM: Run any OpenSource model using a framework such as vLLM
- TTS: Both PlayHt and Deepgram offer TTS models that can be run locally
- STT: Deepgram offers a local model that can be run locally
If you implement all three models locally, you should have much better performance. We have been able to get ~300ms voice-to-voice responses.
Examples
- Fastest voice agent: Local only implementation
- RAG voice agent: Create a voice agent that can do RAG using Cerebrium + OpenAI + Pinecone
- Twilio voice agent: Create a voice agent that can receive phone calls via Twilio
- OpenAI Realtime API implementation: Create a voice agent that can receive phone calls via OpenAI Realtime API