Overview

PlayHT provides high-quality text-to-speech synthesis with two implementations:
  • PlayHTTTSService: WebSocket-based with real-time streaming
  • PlayHTHttpTTSService: HTTP-based for simpler synthesis
PlayHTTTSService is recommended for interactive applications requiring low latency.

Installation

To use PlayHT services, install the required dependencies:
pip install "pipecat-ai[playht]"
You’ll also need to set up your PlayHT credentials as environment variables:
  • PLAY_HT_USER_ID
  • PLAY_HT_API_KEY
Get your credentials from the PlayHT Dashboard.

Frames

Input

  • TextFrame - Text content to synthesize into speech
  • TTSSpeakFrame - Text that should be spoken immediately
  • TTSUpdateSettingsFrame - Runtime configuration updates
  • LLMFullResponseStartFrame / LLMFullResponseEndFrame - LLM response boundaries

Output

  • TTSStartedFrame - Signals start of synthesis
  • TTSAudioRawFrame - Generated audio data (WAV format)
  • TTSStoppedFrame - Signals completion of synthesis
  • ErrorFrame - API or processing errors

Service Comparison

FeaturePlayHTTTSService (WebSocket)PlayHTHttpTTSService (HTTP)
Streaming✅ Real-time chunks❌ Single audio block
Latency🚀 Ultra-low📈 Higher
Interruption✅ Advanced handling⚠️ Basic support
ConnectionWebSocket-basedHTTP-based

Language Support

Common languages supported include:
  • Language.EN - English
  • Language.ES - Spanish
  • Language.FR - French
  • Language.DE - German
  • Language.IT - Italian
  • Language.JA - Japanese

Usage Example

Initialize the PlayHTTTSService and use it in a pipeline:
from pipecat.services.playht.tts import PlayHTTTSService
from pipecat.transcriptions.language import Language
import os

# Configure WebSocket service
tts = PlayHTTTSService(
    user_id=os.getenv("PLAYHT_USER_ID"),
    api_key=os.getenv("PLAYHT_API_KEY"),
    voice_url="s3://voice-cloning-zero-shot/your-voice-id/manifest.json",
    voice_engine="PlayHT3.0-mini",
    params=PlayHTTTSService.InputParams(
        language=Language.EN,
        speed=1.2,
        seed=42  # For consistent voice output
    )
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

HTTP Service

Initialize the PlayHTHttpTTSService and use it in a pipeline:
from pipecat.services.playht.tts import PlayHTHttpTTSService

# For simpler, non-streaming use cases
http_tts = PlayHTHttpTTSService(
    user_id=os.getenv("PLAYHT_USER_ID"),
    api_key=os.getenv("PLAYHT_API_KEY"),
    voice_url="your-voice-url",
    voice_engine="Play3.0-mini",
    protocol="http",
    params=PlayHTHttpTTSService.InputParams(
        language=Language.EN,
        speed=1.0
    )
)

Dynamic Voice Switching

Make settings updates by pushing a TTSUpdateSettingsFrame:
from pipecat.frames.frames import TTSUpdateSettingsFrame

await task.queue_frame(TTSUpdateSettingsFrame(
    voice_id="your-voice-id",
))

Metrics

Both services provide comprehensive metrics:
  • Time to First Byte (TTFB) - Latency from text input to first audio
  • Processing Duration - Total synthesis time
  • Character Usage - Text processed for billing
Learn how to enable Metrics in your Pipeline.

Additional Notes

  • Voice URLs: Use S3 URLs for both standard and cloned voices from PlayHT
  • Engine Selection: Choose based on latency requirements and quality needs
  • WebSocket Recommended: Use PlayHTTTSService for real-time interactive applications