Piper
Text-to-speech service implementation using the Piper TTS server
Overview
PiperTTSService
converts text to speech using the Piper TTS server. This service provides integration with a locally-running Piper TTS service, offering self-hosted speech synthesis capabilities.
Installation
To use PiperTTSService
, no additional dependencies in Pipecat are required.
You’ll also need to set up a running Piper TTS server following the Piper HTTP server documentation.
Configuration
Constructor Parameters
API base URL for the Piper TTS server (without a trailing slash)
aiohttp ClientSession for making HTTP requests
Output sample rate in Hz. When None, the sample rate depends on the voice model being used by the Piper server.
Modifies text provided to the TTS. Learn more about the available filters.
Input
The service accepts text input through its TTS pipeline.
Output Frames
TTSStartedFrame
Signals the start of audio generation.
TTSAudioRawFrame
Contains generated audio data:
Raw audio data chunk
Audio sample rate (depends on the Piper model)
Number of audio channels (1 for mono)
TTSStoppedFrame
Signals the completion of audio generation.
ErrorFrame
Signals that an error occurred during audio generation:
Error message
Methods
See the TTS base class methods for additional functionality.
Usage Example
Frame Flow
Metrics Support
The service supports metrics collection:
- Time to First Byte (TTFB)
- TTS usage metrics
- Processing duration
Audio Processing
- Streams audio in 1KB chunks
- Automatically handles WAV headers in the response
- Outputs mono audio
- Supports the sample rate specified by your Piper voice model
Notes
- Requires a running Piper TTS server
- Self-hosted solution with no external API dependencies
- Streams audio in chunks for efficient processing
- Automatically handles WAV headers in the response
- Provides metrics collection