This quickstart guide will help you set up and run your first Pipecat application. You’ll create a simple voice AI bot that you can talk to in real-time using your browser.

Prerequisites

Python 3.10+

Pipecat requires Python 3.10 or newer. Check your version with:
python --version
If you need to upgrade Python, we recommend using a version manager like uv or pyenv.

AI Service API Keys

This quickstart uses three AI services working together in a pipeline. You’ll need API keys from each service: Have these API keys ready. You’ll add them to your environment file in the next section.

Setup

1. Clone the quickstart repository

git clone https://github.com/pipecat-ai/pipecat-quickstart.git
cd pipecat-quickstart

2. Set up your environment

Create your environment file:
cp env.example .env
Open the .env file in your text editor and add your API keys:
DEEPGRAM_API_KEY=your_deepgram_api_key
OPENAI_API_KEY=your_openai_api_key
CARTESIA_API_KEY=your_cartesia_api_key
Replace each placeholder with your actual API key from the respective service.

3. Install dependencies

Set up your virtual environment and install dependencies:
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt
Using uv? Create a venv and get dependencies with: uv sync.

4. Run your bot

Now you’re ready to run your bot! Start it with:
python bot.py
Using uv? Run with: uv run bot.py
You should see output similar to this:
🚀 WebRTC server starting at http://localhost:7860/client
   Open this URL in your browser to connect!
First run timing: The initial startup may take around 15 seconds as Pipecat downloads required models like the Silero VAD (Voice Activity Detection) model. Subsequent runs will be much faster.

5. Connect and test

Open http://localhost:7860/client in your browser. You’ll see the Voice UI Kit console interface with a connect button in the upper right corner. Click Connect and allow microphone access when prompted. The console will establish a connection to your bot server and you can start having a voice conversation! When you’re finished, click Disconnect or close the browser tab to end the session. You can also stop the bot by pressing Ctrl+C in your terminal.

Understanding the Quickstart Bot

When you speak to your bot, here’s the real-time pipeline that processes your conversation:
  1. Audio Capture: Your browser captures microphone audio and sends it via WebRTC
  2. Voice Activity Detection: Silero VAD detects when you start and stop speaking
  3. Speech Recognition: Deepgram converts your speech to text in real-time
  4. Language Processing: OpenAI’s GPT model generates an intelligent response
  5. Speech Synthesis: Cartesia converts the response text back to natural speech
  6. Audio Playback: The generated audio streams back to your browser
Each step happens with minimal latency, typically completing the full round-trip in under one second.

AI Services

Your bot uses three AI services, each configured with API keys from your .env file:
# Create AI Services
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
    api_key=os.getenv("CARTESIA_API_KEY"),
    voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
Pipecat supports many different AI services. You can swap out Deepgram for Azure Speech, OpenAI for Anthropic, or Cartesia for ElevenLabs without changing the rest of your code. See the supported services documentation for all available options.

Context and Messages

Your bot maintains conversation history using a context object, enabling multi-turn interactions where the bot remembers what was said earlier. The context is initialized with a system message that defines the bot’s personality:
# All messages use the OpenAI message format
messages = [
    {
        "role": "system",
        "content": "You are a friendly AI assistant. Respond naturally and keep your answers conversational.",
    },
]

context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
The context aggregator automatically collects user messages (after speech-to-text) and assistant responses (after text-to-speech), maintaining the conversation flow without manual intervention.

RTVI Protocol

When building web or mobile clients, you can use Pipecat’s client SDKs that communicate with your bot via the RTVI (Real-Time Voice Interaction) protocol. In our quickstart example, we initialize the RTVI processor to handle client-server messaging and events:
rtvi = RTVIProcessor(config=RTVIConfig(config=[]))

Pipeline Configuration

The core of your bot is a Pipeline that processes data through a series of processors:
# Create the pipeline with the processors
pipeline = Pipeline([
    transport.input(),              # Receive audio from browser
    rtvi,                           # Protocol for client/server messaging and events
    stt,                            # Speech-to-text (Deepgram)
    context_aggregator.user(),      # Add user message to context
    llm,                            # Language model (OpenAI)
    tts,                            # Text-to-speech (Cartesia)
    transport.output(),             # Send audio back to browser
    context_aggregator.assistant(), # Add bot response to context
])
Data flows through the pipeline as “frames”, objects containing audio, text, or other data types. The ordering is crucial: audio must be transcribed before it can be processed by the LLM, and text must be synthesized before it can be played back. The pipeline is managed by a PipelineTask:
# Create a PipelineTask to manage the pipeline execution
task = PipelineTask(
    pipeline,
    params=PipelineParams(
        enable_metrics=True,
        enable_usage_metrics=True,
    ),
    observers=[RTVIObserver(rtvi)],
)
The task handles pipeline execution, collects metrics, and manages RTVI events through observers.

Event Handlers

Event handlers manage the bot’s lifecycle and user interactions:
# Event handler for when a client connects
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
    logger.info(f"Client connected")
    # Add a greeting message to the context
    messages.append({"role": "system", "content": "Say hello and briefly introduce yourself."})
    # Get a context frame and queue it for the task
    # This is what prompts the bot to start talking when the client connects
    await task.queue_frames([context_aggregator.user().get_context_frame()])

# Event handler for when a client disconnects
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
    logger.info(f"Client disconnected")
    # Cancel the task when the client disconnects
    # This stops the pipeline and all processors, cleaning up resources
    await task.cancel()
When a client connects, the bot adds a greeting instruction and queues a context frame to initiate the conversation. When disconnecting, it properly cancels the task to clean up resources.

Running the Pipeline

Finally, the pipeline is executed by a PipelineRunner:
# Create a PipelineRunner to run the task
runner = PipelineRunner(handle_sigint=False)

# Finally, run the task using the runner
# This will start the pipeline and begin processing frames
await runner.run(task)
The runner manages the pipeline’s execution lifecycle. Note that handle_sigint=False because the main runner handles system signals.

Bot Entry Point

The quickstart uses Pipecat’s runner system:
async def bot(session_args: SmallWebRTCSessionArguments):
    """Main bot entry point for the bot starter."""

    transport = SmallWebRTCTransport(
        params=TransportParams(
            audio_in_enabled=True,
            audio_out_enabled=True,
            vad_analyzer=SileroVADAnalyzer(),
        ),
        webrtc_connection=session_args.webrtc_connection,
    )

    await run_bot(transport)

if __name__ == "__main__":
    from pipecat.runner.run import main
    main()
This runner automatically handles WebRTC connection setup and management, making it easy to get started with minimal configuration.
Ready for production? This bot pattern is compatible with Pipecat Cloud, meaning you can deploy your bot without any code changes.

Troubleshooting

  • Browser permissions: Make sure to allow microphone access when prompted by your browser.
  • Connection issues: If the WebRTC connection fails, first try a different browser. If that fails, make sure you don’t have a VPN or firewall rules blocking traffic. WebRTC uses UDP to communicate.
  • Audio issues: Check that your microphone and speakers are working and not muted.

Next Steps

Now that you have your first bot working, learn how to build applications for different platforms: You can also:
  • Customize your bot: Edit the system prompt in bot.py to change your agent’s personality and behavior. Try different roles like a helpful assistant, creative writer, or domain expert.
  • Join the community: Connect with other Pipecat developers on Discord to share your projects, get help, and see what others are building.