Skip to main content
Pipecat is an open source Python framework for building voice and multimodal AI agents. It orchestrates AI services, network transports, and audio processing to enable ultra-low latency conversations that feel natural and responsive.

Quickstart

Want to dive right in? Build and run your first Pipecat application

What You Can Build

Voice Assistants

Natural, real-time conversations with AI using speech recognition and synthesis

Phone Agents

Connect to your agent via phone for support, intake, and customer service interactions

Multimodal Apps

Applications that combine voice, video, images, and text for rich interactions

Creative Experiences

Storytelling experiences and social companions that engage users

Interactive Games

Voice-controlled games and interactive experiences with real-time AI responses

Conversation Flows

Build structured conversations with Pipecat Flows to complete tasks and improve LLM accuracy

How It Works

Pipecat orchestrates AI services in a pipeline, which is a series of processors that handle real-time audio, text, and video frames with ultra-low latency. Here’s what happens in a typical voice conversation:
  1. Transport receives audio from the user (browser, phone, etc.)
  2. Speech Recognition converts speech to text in real-time
  3. LLM generates intelligent responses based on context
  4. Speech Synthesis converts responses back to natural speech
  5. Transport streams audio back to the user
In most cases, the entire round-trip interaction happens between 500-800ms, creating a natural conversation experience for the user. Pipecat Overview

Ready to Build?

Quickstart

Build and run your first Pipecat application

Core Concepts

Learn about pipelines, processors, transports, and context management

Supported Services

Browse the complete list of 100+ AI service integrations

Deploy

Deploy to Pipecat Cloud or self-host on your own infrastructure