Deploy Pipecat applications to Cerebrium
Cerebrium is a serverless Infrastructure platform that makes it easy for companies to build, deploy and scale AI applications. Cerebrium offers both CPUs and GPUs (H100s, A100s etc) with extremely low cold start times allowing us to create highly performant applications in the most cost efficient manner.
To get started, let us run the following commands:
pip install cerebrium
to install the Python package.cerebrium login
to authenticate yourself.If you don’t have a Cerebrium account, you can create one and get started with $30 in free credits.
main.py
- Your application entrypointcerebrium.toml
- Configuration for build and environment settingsUpdate your cerebrium.toml
with the necessary configuration:
In order for our application to work, we need to copy our API keys from the various platforms. Navigate to the Secrets section in your Cerebrium dashboard to store your API keys:
OPENAI_API_KEY
- We use OpenAI For the LLM. You can get your API key from hereDAILY_TOKEN
- For WebRTC communication. You can get your token from hereCARTERSIA_API_KEY
- For text-to-speech services. You can get your API key from hereWe access these secrets in our code as if they are normal ENV vars. For the above, You can swap in any LLM or TTS service you wish to use.
We create a basic pipeline setup in our main.py
that combines our LLM, TTS and Daily WebRTC transport layer.
First, in our main function, we initialize the daily transport layer to receive/send the audio/video data from the Daily room we will connect to. You can see we pass the room_url we would like to join as well as a token to authenticate us programmatically joining. We also set our VAD stop seconds which is the amount of time we wait for a pause before our bot will respond - in this example, we set it to 600 milliseconds.
Next we connect to our LLM (OpenAI) as well as our TTS model (Cartesia). By setting ‘transcription_enabled=true’ we are using the STT from Daily itself. This is where the Pipecat framework helps convert audio data to text and vice versa.
We then put this all together as a PipelineTask which is what Pipecat runs all together. The make up of a task is completely customizable and has support for Image and Vision use cases. Lastly, we have some event handlers for when a user joins/leaves the room.
Deploy your application to Cerebrium:
You will then see that an endpoint is created for your bot at POST \<BASE_URL\>/main
that you can call with your room_url and token. Let us test it.
Since Cerebrium supports both CPU and GPU workloads if you would like to lower the latency of your application then the best would be to get model weights from various providers and run them locally. You can do this for:
If you implement all three models locally, you should have much better performance. We have been able to get ~300ms voice-to-voice responses.
Deploy Pipecat applications to Cerebrium
Cerebrium is a serverless Infrastructure platform that makes it easy for companies to build, deploy and scale AI applications. Cerebrium offers both CPUs and GPUs (H100s, A100s etc) with extremely low cold start times allowing us to create highly performant applications in the most cost efficient manner.
To get started, let us run the following commands:
pip install cerebrium
to install the Python package.cerebrium login
to authenticate yourself.If you don’t have a Cerebrium account, you can create one and get started with $30 in free credits.
main.py
- Your application entrypointcerebrium.toml
- Configuration for build and environment settingsUpdate your cerebrium.toml
with the necessary configuration:
In order for our application to work, we need to copy our API keys from the various platforms. Navigate to the Secrets section in your Cerebrium dashboard to store your API keys:
OPENAI_API_KEY
- We use OpenAI For the LLM. You can get your API key from hereDAILY_TOKEN
- For WebRTC communication. You can get your token from hereCARTERSIA_API_KEY
- For text-to-speech services. You can get your API key from hereWe access these secrets in our code as if they are normal ENV vars. For the above, You can swap in any LLM or TTS service you wish to use.
We create a basic pipeline setup in our main.py
that combines our LLM, TTS and Daily WebRTC transport layer.
First, in our main function, we initialize the daily transport layer to receive/send the audio/video data from the Daily room we will connect to. You can see we pass the room_url we would like to join as well as a token to authenticate us programmatically joining. We also set our VAD stop seconds which is the amount of time we wait for a pause before our bot will respond - in this example, we set it to 600 milliseconds.
Next we connect to our LLM (OpenAI) as well as our TTS model (Cartesia). By setting ‘transcription_enabled=true’ we are using the STT from Daily itself. This is where the Pipecat framework helps convert audio data to text and vice versa.
We then put this all together as a PipelineTask which is what Pipecat runs all together. The make up of a task is completely customizable and has support for Image and Vision use cases. Lastly, we have some event handlers for when a user joins/leaves the room.
Deploy your application to Cerebrium:
You will then see that an endpoint is created for your bot at POST \<BASE_URL\>/main
that you can call with your room_url and token. Let us test it.
Since Cerebrium supports both CPU and GPU workloads if you would like to lower the latency of your application then the best would be to get model weights from various providers and run them locally. You can do this for:
If you implement all three models locally, you should have much better performance. We have been able to get ~300ms voice-to-voice responses.