Fish Audio
Real-time text-to-speech service using Fish Audio’s WebSocket API
Overview
The FishAudioTTSService
provides real-time text-to-speech synthesis using Fish Audio’s WebSocket API. It supports streaming audio output, multiple voices, and various audio formats.
Installation
To use Fish Audio, install the required dependencies:
You’ll need to set up your Fish Audio API key as an environment variable: FISH_API_KEY
.
Constructor Parameters
Fish Audio API key
Reference ID for the voice model
Audio output format. Options: “opus”, “mp3”, “pcm”, “wav”
Output audio sample rate in Hz
Basic Usage
Input Parameters
Language for speech synthesis. See Language Support section for available options.
Latency mode for synthesis. Options: “normal” or “balanced”
Speech speed adjustment. Range: 0.5 to 2.0
Volume adjustment in decibels (dB)
Output Frames
Control Frames
Signals start of synthesis
Signals completion of synthesis
Audio Frames
Contains generated audio data with: - Specified format (PCM, WAV, MP3, or Opus) - Configured sample rate - Single channel (mono)
Error Frames
Contains Fish Audio error information
Language Support
Supports multiple languages through the Language enum:
Language Code | Service Code |
---|---|
Language.EN | en-US |
Language.ZH | zh-CN |
Usage Example
Frame Flow
Metrics Support
The service collects processing metrics:
- Time to First Byte (TTFB)
- Processing duration
- Character usage
- API calls