Fish Audio

Overview

Fish Audio provides real-time text-to-speech synthesis through a WebSocket-based streaming API. The service offers custom voice models, prosody controls, and multiple audio formats optimized for conversational AI applications with low latency.

API Reference

Complete API documentation and method details

Fish Audio Docs

Official Fish Audio WebSocket API documentation

Example Code

Working example with custom voice model

Installation

To use Fish Audio services, install the required dependencies:

pip install "pipecat-ai[fish]"

You’ll also need to set up your Fish Audio API key as an environment variable: FISH_API_KEY.

Get your API key from the Fish Audio Console.

Frames

Input

TextFrame - Text content to synthesize into speech
TTSSpeakFrame - Text that should be spoken immediately
TTSUpdateSettingsFrame - Runtime configuration updates
LLMFullResponseStartFrame / LLMFullResponseEndFrame - LLM response boundaries

Output

TTSStartedFrame - Signals start of synthesis
TTSAudioRawFrame - Generated audio data chunks (streaming)
TTSStoppedFrame - Signals completion of synthesis
ErrorFrame - API or processing errors

Sample Rate Options

Supported sample rates for different quality levels:

8000 Hz - Phone quality
16000 Hz - Standard quality
24000 Hz - High quality (recommended)
44100 Hz - CD quality
48000 Hz - Professional quality

Language Support

Fish Audio currently supports:

Language Code	Description	Service Code
`Language.EN`	English	`en`
`Language.JA`	Japanese	`ja`
`Language.ZH`	Chinese	`zh`

Fish Audio is expanding language support. Check the official documentation for the latest available languages.

Latency Modes

Choose the appropriate latency mode for your application:

Mode	Description	Best For
`normal`	Standard latency (Default)	General applications
`balanced`	Balanced quality/speed	Real-time conversations

Usage Example

Basic Configuration

from pipecat.services.fish.tts import FishAudioTTSService
from pipecat.transcriptions.language import Language
import os

# Configure service with custom voice
tts = FishAudioTTSService(
    api_key=os.getenv("FISH_API_KEY"),
    reference_id="4ce7e917cedd4bc2bb2e6ff3a46acaa1",  # Voice model ID
    model_id="speech-1.5",
    output_format="pcm",
    sample_rate=24000,
    params=FishAudioTTSService.InputParams(
        language=Language.EN,
        latency="normal",
        prosody_speed=1.0,
        prosody_volume=0
    )
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

Advanced Prosody Control

# Custom prosody settings
tts = FishAudioTTSService(
    api_key=os.getenv("FISH_API_KEY"),
    reference_id="your-voice-model-id",
    params=FishAudioTTSService.InputParams(
        language=Language.EN,
        latency="balanced",      # Balance quality vs speed
        prosody_speed=1.2,       # 20% faster speech
        prosody_volume=3,        # +3dB volume boost
        normalize=True           # Normalize audio output
    )
)

Dynamic Configuration

Make settings updates by pushing a TTSUpdateSettingsFrame:

from pipecat.frames.frames import TTSUpdateSettingsFrame

await task.queue_frame(TTSUpdateSettingsFrame(
    reference_id="new-voice-model-id",  # Change voice model
  )
)

Metrics

The service provides comprehensive metrics:

Time to First Byte (TTFB) - Latency from text input to first audio
Processing Duration - Total synthesis time
Character Usage - Text processed for billing

Learn how to enable Metrics in your Pipeline.

Additional Notes

WebSocket Streaming: Real-time audio generation with automatic chunking
Interruption Handling: Built-in support for conversation interruptions
Custom Voice Models: Use your own trained voice models via reference IDs
Audio Buffering: Efficient streaming with configurable buffer sizes
Connection Management: Automatic reconnection on connection failures
Format Flexibility: Multiple audio formats for different deployment scenarios
Prosody Control: Fine-tune speech characteristics including speed and volume

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

API Reference

Fish Audio Docs

Example Code

Installation

Frames

Input

Output

Sample Rate Options

Language Support

Latency Modes

Usage Example

Basic Configuration

Advanced Prosody Control

Dynamic Configuration

Metrics

Additional Notes

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

API Reference

Fish Audio Docs

Example Code

​Installation

​Frames

​Input

​Output

​Sample Rate Options

​Language Support

​Latency Modes

​Usage Example

​Basic Configuration

​Advanced Prosody Control

​Dynamic Configuration

​Metrics

​Additional Notes

Overview

Installation

Frames

Input

Output

Sample Rate Options

Language Support

Latency Modes

Usage Example

Basic Configuration

Advanced Prosody Control

Dynamic Configuration

Metrics

Additional Notes