Skip to main content

Overview

ResembleAITTSService provides high-quality text-to-speech synthesis using Resemble AI’s streaming WebSocket API with word-level timestamps and audio context management for handling multiple simultaneous synthesis requests with proper interruption support.

Installation

To use Resemble AI services, install the required dependencies:
pip install "pipecat-ai[resemble]"

Prerequisites

Resemble AI Account Setup

Before using Resemble AI TTS services, you need:
  1. Resemble AI Account: Sign up at Resemble AI
  2. API Key: Generate an API key from your account settings
  3. Voice Selection: Choose or create voice UUIDs from your voice library

Required Environment Variables

  • RESEMBLE_API_KEY: Your Resemble AI API key for authentication

Configuration

ResembleAITTSService

api_key
str
required
Resemble AI API key for authentication.
voice_id
str
required
Voice UUID to use for synthesis.
url
str
default:"wss://websocket.cluster.resemble.ai/stream"
WebSocket URL for Resemble AI TTS API.
precision
str
default:"PCM_16"
PCM bit depth. Options: PCM_32, PCM_24, PCM_16, or MULAW.
output_format
str
default:"wav"
Audio output format (wav or mp3).
sample_rate
int
default:"22050"
Audio sample rate in Hz. Options: 8000, 16000, 22050, 32000, or 44100.

Usage

Basic Setup

from pipecat.services.resembleai import ResembleAITTSService

tts = ResembleAITTSService(
    api_key=os.getenv("RESEMBLE_API_KEY"),
    voice_id="your-voice-uuid",
)

With Custom Settings

from pipecat.services.resembleai import ResembleAITTSService

tts = ResembleAITTSService(
    api_key=os.getenv("RESEMBLE_API_KEY"),
    voice_id="your-voice-uuid",
    sample_rate=16000,
    precision="PCM_16",
    output_format="wav",
)

Notes

  • Word-level timestamps: Resemble AI provides word-level timing information, enabling synchronized text highlighting and precise interruption handling.
  • Jitter buffering: The service buffers approximately 1 second of audio before starting playback to absorb network latency gaps (Resemble AI sends audio in bursts with 300-450ms gaps).
  • Audio context management: Supports multiple simultaneous synthesis requests with proper context tracking and interruption handling.
  • Default sample rate: Defaults to 22050 Hz. Supported rates are 8000, 16000, 22050, 32000, and 44100 Hz.

Event Handlers

Resemble AI TTS supports the standard service connection events:
EventDescription
on_connectedConnected to Resemble AI WebSocket
on_disconnectedDisconnected from Resemble AI WebSocket
on_connection_errorWebSocket connection error occurred
@tts.event_handler("on_connected")
async def on_connected(service):
    print("Connected to Resemble AI")