Skip to main content

Overview

KokoroTTSService provides local, offline text-to-speech synthesis using the kokoro-onnx engine. It runs entirely on the host machine with no external API calls or authentication required. Model files are automatically downloaded to ~/.cache/kokoro-onnx/ on first use.

Kokoro TTS API Reference

Pipecat’s API methods for Kokoro TTS integration

Example Implementation

Complete example with interruption handling

kokoro-onnx Repository

Official kokoro-onnx project and documentation

Settings Update Example

Example showing runtime settings updates

Installation

To use Kokoro TTS, install the required dependencies:
pip install "pipecat-ai[kokoro]"
This installs kokoro-onnx>=0.5.0 and its dependencies.

Prerequisites

Local Setup

Kokoro runs locally and does not require an API key or external service. On first use, the service automatically downloads two model files to ~/.cache/kokoro-onnx/:
  • kokoro-v1.0.onnx — the ONNX speech synthesis model
  • voices-v1.0.bin — the voice data file
You can also provide custom paths to pre-downloaded model files via the model_path and voices_path constructor parameters.
The initial model download may take a few minutes depending on your connection speed. Subsequent runs use the cached files.

Configuration

KokoroTTSService

model_path
str
default:"None"
Path to a custom ONNX model file. When None, the model is automatically downloaded to ~/.cache/kokoro-onnx/kokoro-v1.0.onnx.
voices_path
str
default:"None"
Path to a custom voices binary file. When None, the file is automatically downloaded to ~/.cache/kokoro-onnx/voices-v1.0.bin.
voice_id
str
default:"None"
deprecated
Voice identifier for synthesis. Deprecated in v0.0.105. Use settings=KokoroTTSService.Settings(voice=...) instead.
params
InputParams
default:"None"
deprecated
Deprecated in v0.0.105. Use settings=KokoroTTSService.Settings(...) instead.
settings
KokoroTTSService.Settings
default:"None"
Runtime-configurable settings. See Settings below.

Settings

Runtime-configurable settings passed via the settings constructor argument using KokoroTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneModel identifier. (Inherited from base settings.)
voicestrNoneVoice identifier (e.g. "af_heart").
languageLanguage | strLanguage.ENLanguage for synthesis. See supported languages.

Supported Languages

Kokoro supports the following languages:
LanguageCode
English (US)Language.EN_US
English (UK)Language.EN_GB
English (generic)Language.EN
SpanishLanguage.ES
FrenchLanguage.FR
HindiLanguage.HI
ItalianLanguage.IT
JapaneseLanguage.JA
PortugueseLanguage.PT
ChineseLanguage.ZH

Usage

Basic Setup

from pipecat.services.kokoro import KokoroTTSService

tts = KokoroTTSService(
    settings=KokoroTTSService.Settings(
        voice="af_heart",
    ),
)

With Language Configuration

from pipecat.services.kokoro import KokoroTTSService
from pipecat.transcriptions.language import Language

tts = KokoroTTSService(
    settings=KokoroTTSService.Settings(
        voice="af_heart",
        language=Language.ES,
    ),
)

With Custom Model Paths

from pipecat.services.kokoro import KokoroTTSService

tts = KokoroTTSService(
    model_path="/path/to/kokoro-v1.0.onnx",
    voices_path="/path/to/voices-v1.0.bin",
    settings=KokoroTTSService.Settings(
        voice="af_heart",
    ),
)
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Notes

  • Fully local: Kokoro runs entirely on the host machine using ONNX Runtime. No API keys, network access, or external services are required after the initial model download.
  • Automatic model caching: Model files are downloaded once to ~/.cache/kokoro-onnx/ and reused on subsequent runs. You can also pre-download models and specify custom paths.
  • Audio resampling: Kokoro’s native output is automatically resampled to match the pipeline’s configured sample rate.
  • Streaming output: The service uses kokoro-onnx’s async streaming API, delivering audio frames incrementally as they are generated.
  • Metrics support: The service supports TTFB (time to first byte) and usage metrics for performance monitoring.