Overview - Pipecat

Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. “Multimodal” means you can use any combination of audio, video, images, and/or text in your interactions. And “real-time” means that things are happening quickly enough that it feels conversational—a “back-and-forth” with a bot, not submitting a query and waiting for results.

What You Can Build

Voice Assistants

Natural, real-time conversations with AI using speech recognition and synthesis

Interactive Agents

Personal coaches and meeting assistants that can understand context and provide guidance

Multimodal Apps

Applications that combine voice, video, images, and text for rich interactions

Creative Tools

Storytelling experiences and social companions that engage users

Business Solutions

Customer intake flows and support bots for automated business processes

Complex Flows

Structured conversations using Pipecat Flows for managing complex interactions

How It Works

The flow of interactions in a Pipecat application is typically straightforward:

The bot says something
The user says something
The bot says something
The user says something

This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing.

Real-time Processing

Pipecat’s pipeline architecture handles both simple voice interactions and complex multimodal processing. Let’s look at how data flows through the system:

Send Audio

Transmit and capture streamed audio from the user

Transcribe Speech

Convert speech to text as the user is talking

Process with LLM

Generate responses using a large language model

Convert to Speech

Transform text responses into natural speech

Play Audio

Stream the audio response back to the user

In both cases, Pipecat:

Processes responses as they stream in
Handles multiple input/output modalities concurrently
Manages resource allocation and synchronization
Coordinates parallel processing tasks

This architecture creates fluid, natural interactions without noticeable delays, whether you’re building a simple voice assistant or a complex multimodal application. Pipecat’s pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved.

Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure.

Next Steps

Ready to build your first Pipecat application?

Installation & Setup

Prepare your environment and install required dependencies

Quickstart

Build and run your first Pipecat application

Core Concepts

Learn about pipelines, frames, and real-time processing

Use Cases

Explore example implementations and patterns

Join Our Community

Need help or want to share your project? Join our Discord community where you can connect with other developers and get support from the Pipecat team.

Get Started

​What You Can Build

Voice Assistants

Interactive Agents

Multimodal Apps

Creative Tools

Business Solutions

Complex Flows

​How It Works

​Real-time Processing

​Next Steps

Installation & Setup

Quickstart

Core Concepts

Use Cases

​Join Our Community

What You Can Build

How It Works

Real-time Processing

Next Steps

Join Our Community