Real-time text-to-speech service using Fish Audio’s WebSocket API
FISH_API_KEY
.
TextFrame
- Text content to synthesize into speechTTSSpeakFrame
- Text that should be spoken immediatelyTTSUpdateSettingsFrame
- Runtime configuration updatesLLMFullResponseStartFrame
/ LLMFullResponseEndFrame
- LLM response boundariesTTSStartedFrame
- Signals start of synthesisTTSAudioRawFrame
- Generated audio data chunks (streaming)TTSStoppedFrame
- Signals completion of synthesisErrorFrame
- API or processing errorsLanguage Code | Description | Service Code |
---|---|---|
Language.EN | English | en |
Language.JA | Japanese | ja |
Language.ZH | Chinese | zh |
Mode | Description | Best For |
---|---|---|
normal | Standard latency (Default) | General applications |
balanced | Balanced quality/speed | Real-time conversations |
TTSUpdateSettingsFrame
: