Using Daily as a WebRTC transport for your Pipecat Cloud agents
Daily is a WebRTC platform that provides real-time voice and video capabilities to connect users with voice agents.Pipecat Cloud offers a first-class integration with Daily, making it straightforward to deploy WebRTC-enabled agents without managing complex infrastructure yourself.
When you create a Pipecat Cloud account, you’re automatically provisioned with a Daily API key that’s fully integrated with the platform.The integrated Daily API key provides:
Zero configuration: No need to separately sign up for Daily or manage API keys
Free voice minutes: Voice minutes for one human participant and one agent are free when using your Pipecat Cloud-provisioned Daily key
Simplified deployment: Create Daily rooms and launch agents with a single command
Built-in compatibility: All Pipecat base images work with Daily out of the box
While 1:1 voice minutes are included, additional Daily features like
recording, transcription, and PSTN/SIP connections are billed according to
Daily’s standard pricing.
When starting a Pipecat Cloud agent, you can specify Daily as the transport using the --use-daily flag with the CLI or setting the appropriate parameters in the SDK or REST API calls.
Copy
Ask AI
# Start an agent instance with a Daily roompcc agent start my-agent-name --use-daily# Start with custom Daily room propertiespcc agent start my-agent-name --use-daily --daily-properties '{"enable_recording": "cloud"}'
This command creates a Daily room and provides a URL you can open in your browser to interact with your agent using voice.
While Pipecat Cloud provides a Daily API key with included 1:1 voice minutes, you can optionally use your own Daily API key if you have specific requirements. When using your own key, all usage will be billed according to your Daily account’s pricing plan rather than being included with your Pipecat Cloud subscription.
When building voice AI applications, choosing the right transport technology is crucial:
WebRTC is purpose-built for real-time audio and video communication between browsers and devices:
Optimized for real-time media streaming over unpredictable networks
Uses UDP protocol, prioritizing speed over guaranteed packet delivery
Provides built-in echo cancellation and noise suppression
Intelligently adapts bitrate based on changing network conditions
Handles NAT traversal for connections across different networks
Includes device management for cameras and microphones
WebSockets work well for server-to-server communication because:
They operate in controlled network environments with stable connections
When server-to-server latency is low, packet retransmission doesn’t add substantial delay
Server environments don’t need device access or media quality enhancements
They’re simpler to implement for pure data transmission
Many server platforms have built-in WebSocket support
For browser or mobile app voice applications, WebRTC delivers superior
performance across varying network conditions. For more details, see How to
Talk to an LLM with Your
Voice.
When connecting to telephony systems like Twilio or implementing server-to-server communication where network conditions are controlled and reliable, WebSockets remain an appropriate choice. However, for any user-facing voice or video application, WebRTC offers significant advantages in quality and reliability.