Overview

PlayHT provides two TTS service implementations:

  • PlayHTTTSService: WebSocket-based service with real-time streaming
  • PlayHTHttpTTSService: HTTP-based service for simpler, non-streaming synthesis

Installation

To use PlayHT services, install the required dependencies:

pip install pipecat-ai[playht]

You’ll also need to set up your PlayHT credentials as environment variables:

  • PLAY_HT_USER_ID
  • PLAY_HT_API_KEY

PlayHTTTSService

WebSocket-based implementation supporting real-time streaming synthesis.

Constructor Parameters

api_key
str
required

PlayHT API key

user_id
str
required

PlayHT user ID

voice_url
str
required

Voice identifier URL

voice_engine
str
default: "PlayHT3.0-mini"

TTS engine identifier. See the PlayHT docs for available engines.

sample_rate
int
default: "24000"

Output audio sample rate in Hz

output_format
str
default: "wav"

Audio output format

text_filter
BaseTextFilter
default: "None"

Modifies text provided to the TTS. Learn more about the available filters.

Input Parameters

class InputParams(BaseModel):
    language: Optional[Language] = Language.EN
    speed: Optional[float] = 1.0
    seed: Optional[int] = None

PlayHTHttpTTSService

HTTP-based implementation for simpler synthesis requirements.

Constructor Parameters

api_key
str
required

PlayHT API key

user_id
str
required

PlayHT user ID

voice_url
str
required

Voice identifier URL

voice_engine
str
default: "PlayHT3.0-mini"

TTS engine identifier. See the PlayHT docs for available engines.

sample_rate
int
default: "24000"

Output audio sample rate in Hz

Input Parameters

class InputParams(BaseModel):
    language: Optional[Language] = Language.EN
    speed: Optional[float] = 1.0
    seed: Optional[int] = None

Output Frames

Control Frames

TTSStartedFrame
Frame

Signals start of synthesis

TTSStoppedFrame
Frame

Signals completion of synthesis

Audio Frames

TTSAudioRawFrame
Frame

Contains generated audio data with: - WAV format - Specified sample rate - Single channel (mono)

Error Frames

ErrorFrame
Frame

Contains PlayHT error information

Methods

See the TTS base class methods for additional functionality.

Language Support

Supports multiple languages when using the PlayHT3.0-mini engine:

Language CodeDescriptionService Code
Language.BGBulgarianBULGARIAN
Language.CACatalanCATALAN
Language.CSCzechCZECH
Language.DADanishDANISH
Language.DEGermanGERMAN
Language.ENEnglishENGLISH
Language.ESSpanishSPANISH
Language.FRFrenchFRENCH
Language.ELGreekGREEK
Language.HIHindiHINDI
Language.HUHungarianHUNGARIAN
Language.IDIndonesianINDONESIAN
Language.ITItalianITALIAN
Language.JAJapaneseJAPANESE
Language.KOKoreanKOREAN
Language.MSMalayMALAY
Language.NLDutchDUTCH
Language.PLPolishPOLISH
Language.PTPortuguesePORTUGUESE
Language.RURussianRUSSIAN
Language.SVSwedishSWEDISH
Language.THThaiTHAI
Language.TRTurkishTURKISH
Language.UKUkrainianUKRAINIAN

Usage Examples

WebSocket Service

# Configure WebSocket service
ws_service = PlayHTTTSService(
    api_key="your-api-key",
    user_id="your-user-id",
    voice_url="voice-url",
    voice_engine="PlayHT3.0-mini",
    params=PlayHTTTSService.InputParams(
        language=Language.EN,
        speed=1.2
    )
)

# Use in pipeline
pipeline = Pipeline([
    text_input,
    ws_service,
    audio_output
])

HTTP Service

# Configure HTTP service
http_service = PlayHTHttpTTSService(
    api_key="your-api-key",
    user_id="your-user-id",
    voice_url="voice-url",
    voice_engine="PlayHT3.0-mini",
    params=PlayHTHttpTTSService.InputParams(
        language=Language.EN,
        speed=1.0
    )
)

Frame Flow

WebSocket Service

HTTP Service

Metrics Support

Both services collect processing metrics:

  • Time to First Byte (TTFB)
  • Processing duration
  • Character usage
  • API calls

Notes

WebSocket Service

  • Real-time streaming support
  • Automatic reconnection
  • Interruption handling
  • WAV header management
  • Thread-safe processing

HTTP Service

  • Simpler implementation
  • Complete audio delivery
  • WAV header parsing
  • Chunked audio delivery
  • Lower latency for short texts

Common Features

  • Multiple voice engines
  • Speed control
  • Language support
  • Seed-based consistency
  • Error handling
  • Metrics collection