Skip to main content

Overview

VonageFrameSerializer enables integration with the Vonage Video API Audio Connector WebSocket protocol, allowing Pipecat applications to process real-time audio streams from active Vonage video sessions.

Vonage Serializer API Reference

Pipecat’s API methods for Vonage Audio Connector Streams integration

Example Implementation

End-to-end Pipecat example using Vonage Audio Connector

Vonage Audio Connector Documentation

Official Vonage Video API Audio Connector documentation

Vonage Video API Console

Manage Vonage Video API projects

Installation

The VonageFrameSerializer does not require any additional dependencies beyond the core Pipecat library:
pip install "pipecat-ai"

Prerequisites

Vonage Video API Account Setup

Before using VonageFrameSerializer, you need:
  1. Vonage (TokBox) Account: Sign up at Vonage Video API Console
  2. Vonage Video API Project: Create a project to obtain Project API Key and Project Secret
  3. Existing Vonage Video Session: A Vonage session must already exist. Sessions can be created using TokBox Playground or Vonage Video API SDKs

Required Environment Variables

  • VONAGE_API_KEY: Your Vonage Video API project key
  • VONAGE_API_SECRET: Your Vonage Video API project secret
  • VONAGE_SESSION_ID: The existing routed session ID
  • WS_URI: Public WebSocket endpoint URI of the server application running Pipecat (e.g. via ngrok)

Required Configuration

  • WebSocket Endpoint (/ws): A WebSocket server application (e.g. FastAPI) running Pipecat that accepts raw PCM audio frames.
  • Audio Connector /connect Request: Triggers Vonage to open a WebSocket connection to your server and begin streaming audio from the active session.

Key Features

  • Bidirectional Audio: Convert between Pipecat and Vonage Audio Connector formats
  • Real-Time AI Pipelines: Stream live audio into Pipecat and process it through any real-time pipeline configuration supported by the framework
  • Session Control Events: Handle Vonage Audio Connector JSON events
  • Linear PCM Audio: Handle raw 16-bit linear PCM audio streams used by the Vonage Video API Audio Connector

Configuration

params
InputParams
default:"None"
Configuration parameters for audio settings. See InputParams below.

InputParams

ParameterTypeDefaultDescription
vonage_sample_rateint16000Sample rate used by Vonage (Hz). Common values: 8000, 16000, 24000.
sample_rateintNoneOptional override for pipeline input sample rate. When None, uses the pipeline’s configured rate.
ignore_rtvi_messagesboolTrueWhether to ignore RTVI protocol messages during serialization.

Usage

Basic Setup

from pipecat.serializers.vonage import VonageFrameSerializer
from pipecat.transports.network.websocket_server import WebSocketServerTransport

serializer = VonageFrameSerializer()

transport = WebSocketServerTransport(
    params=WebSocketServerParams(
        audio_out_enabled=True,
        add_wav_header=False,
        serializer=serializer,
    )
)

With Custom Sample Rate

serializer = VonageFrameSerializer(
    params=VonageFrameSerializer.InputParams(
        vonage_sample_rate=8000,
    ),
)

Notes

  • Linear PCM audio: Unlike Twilio and Plivo, Vonage uses raw 16-bit linear PCM audio (not mu-law encoded). Audio data is sent as binary WebSocket messages rather than base64-encoded JSON.
  • No auto hang-up: The Vonage serializer does not include automatic call termination. Session lifecycle is managed through the Vonage Video API.
  • Event handling: The serializer handles Vonage-specific WebSocket events including websocket:connected, websocket:cleared, websocket:notify, and websocket:dtmf.
  • DTMF support: Touch-tone digit events are converted to InputDTMFFrame objects.