Pipecat is an open source Python framework for building voice and multimodal AI bots that can see, hear, and speak in real-time. The framework orchestrates AI services, network transports, and audio processing to enable ultra-low latency conversations that feel natural and responsive. Build everything from simple voice assistants to complex multimodal applications that combine audio, video, images, and text. Want to dive right in? Check out the Quickstart example to run your first Pipecat application.

Quickstart

Build and run your first Pipecat application

What You Can Build

How It Works

Pipecat orchestrates AI services in a pipeline, which is a series of processors that handle real-time audio, text, and video frames with ultra-low latency. Here’s what happens in a typical voice conversation:
  1. Transport receives audio from the user (browser, phone, etc.)
  2. Speech Recognition converts speech to text in real-time
  3. LLM generates intelligent responses based on context
  4. Speech Synthesis converts responses back to natural speech
  5. Transport streams audio back to the user
In most cases, the entire round-trip interaction happens between 500-800ms, creating a natural conversation experience for the user. The diagram below shows a typical voice assistant pipeline, where each step happens in real-time: Pipecat Overview

Ready to Build?

The best way to understand Pipecat is to build with it. Start with our 5-minute quickstart to create your first voice AI bot.

Quickstart

Build and run your first Pipecat application

Get Involved