Overview

The FishAudioTTSService provides real-time text-to-speech synthesis using Fish Audio’s WebSocket API. It supports streaming audio output, multiple voices, and various audio formats.

Installation

To use Fish Audio, install the required dependencies:

pip install pipecat-ai[fish]

You’ll need to set up your Fish Audio API key as an environment variable: FISH_API_KEY.

Constructor Parameters

api_key
str
required

Fish Audio API key

model
str
required

Reference ID for the voice model

output_format
str
default: "pcm"

Audio output format. Options: “opus”, “mp3”, “pcm”, “wav”

sample_rate
int
default: "24000"

Output audio sample rate in Hz

Basic Usage

tts = FishAudioTTSService(
    api_key=os.getenv("FISH_API_KEY"),
    model="your-model-id",  # Get this from Fish Audio playground
    output_format="pcm",    # Choose output format
    sample_rate=24000,      # Set sample rate
    params=FishAudioTTSService.InputParams(
        latency="normal",
        prosody_speed=1.0
    )
)

Input Parameters

language
Language
default: "Language.EN"

Language for speech synthesis. See Language Support section for available options.

latency
str
default: "normal"

Latency mode for synthesis. Options: “normal” or “balanced”

prosody_speed
float
default: "1.0"

Speech speed adjustment. Range: 0.5 to 2.0

prosody_volume
int
default: "0"

Volume adjustment in decibels (dB)

tts = FishAudioTTSService(
    api_key=os.getenv("FISH_API_KEY"),
    model="your-model-id",
    params=InputParams(
        language=Language.EN,
        latency="normal",     # Balance between quality and speed
        prosody_speed=1.2,    # Slightly faster speech
        prosody_volume=0      # Default volume
    )
)

Output Frames

Control Frames

TTSStartedFrame
Frame

Signals start of synthesis

TTSStoppedFrame
Frame

Signals completion of synthesis

Audio Frames

TTSAudioRawFrame
Frame

Contains generated audio data with: - Specified format (PCM, WAV, MP3, or Opus) - Configured sample rate - Single channel (mono)

Error Frames

ErrorFrame
Frame

Contains Fish Audio error information

Language Support

Supports multiple languages through the Language enum:

Language CodeService Code
Language.ENen-US
Language.ZHzh-CN

Usage Example

from pipecat.services.fish import FishAudioTTSService
from pipecat.transcriptions.language import Language

# Configure service
tts = FishAudioTTSService(
    api_key=os.getenv("FISH_API_KEY"),
    model="e58b0d7efca34eb38d5c4985e378abcb",  # Example model ID
    output_format="pcm",
    params=FishAudioTTSService.InputParams(
        language=Language.EN,
        latency="normal",
        prosody_speed=1.0,
        prosody_volume=0
    )
)

# Use in pipeline
pipeline = Pipeline([
    ...,
    llm,
    tts,
    transport.output(),
])

Frame Flow

Metrics Support

The service collects processing metrics:

  • Time to First Byte (TTFB)
  • Processing duration
  • Character usage
  • API calls