Skip to main content

Overview

Hume provides expressive text-to-speech synthesis using their Octave models, which adapt pronunciation, pitch, speed, and emotional style based on context. HumeTTSService offers real-time streaming with word-level timestamps, custom voice support, and advanced synthesis controls including acting instructions, speed adjustment, and trailing silence configuration.

Hume TTS API Reference

Pipecat’s API methods for Hume TTS integration

Example Implementation

Complete example with word timestamps and interruption handling

Hume Documentation

Official Hume TTS API documentation and features

Voice Library

Browse and manage available voices

Installation

To use Hume services, install the required dependencies:
pip install "pipecat-ai[hume]"

Prerequisites

Hume Account Setup

Before using Hume TTS services, you need:
  1. Hume Account: Sign up at Hume AI
  2. API Key: Generate an API key from your account dashboard
  3. Voice Selection: Choose voice IDs from the voice library or create custom voices

Required Environment Variables

  • HUME_API_KEY: Your Hume API key for authentication

Configuration

HumeTTSService

api_key
str
default:"None"
Hume API key. If omitted, reads the HUME_API_KEY environment variable.
voice_id
str
required
deprecated
ID of the voice to use. Only voice IDs are supported; voice names are not.Deprecated in v0.0.105. Use settings=HumeTTSService.Settings(...) instead.
sample_rate
int
default:"48000"
Output sample rate for PCM frames. Hume TTS streams at 48kHz.
params
InputParams
default:"None"
deprecated
Runtime-configurable synthesis controls. See InputParams below.Deprecated in v0.0.105. Use settings=HumeTTSService.Settings(...) instead.
settings
HumeTTSService.Settings
default:"None"
Runtime-configurable settings. See Settings below.

Settings

Runtime-configurable settings passed via the settings constructor argument using HumeTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneModel identifier. (Inherited.)
voicestrNoneVoice identifier. (Inherited.)
languageLanguage | strNoneLanguage for synthesis. (Inherited.)
descriptionstrNOT_GIVENDescription to guide voice synthesis.
speedfloatNOT_GIVENSpeech rate control.
trailing_silencefloatNOT_GIVENTrailing silence duration in seconds.

Usage

Basic Setup

from pipecat.services.hume import HumeTTSService

tts = HumeTTSService(
    api_key=os.getenv("HUME_API_KEY"),
    settings=HumeTTSService.Settings(
        voice="your-voice-id",
    ),
)

With Acting Directions

tts = HumeTTSService(
    api_key=os.getenv("HUME_API_KEY"),
    settings=HumeTTSService.Settings(
        voice="your-voice-id",
        description="Speak warmly and reassuringly",
        speed=1.1,
        trailing_silence=0.5,
    ),
)

Updating Settings at Runtime

Voice and synthesis parameters can be changed mid-conversation using TTSUpdateSettingsFrame:
from pipecat.frames.frames import TTSUpdateSettingsFrame
from pipecat.services.hume.tts import HumeTTSSettings

await task.queue_frame(
    TTSUpdateSettingsFrame(
        delta=HumeTTSSettings(
            speed=1.3,
            description="Speak with excitement",
        )
    )
)
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Notes

  • Fixed sample rate: Hume TTS streams at 48kHz. Setting a different sample_rate will produce a warning.
  • Word timestamps: The service provides word-level timestamps for synchronized text display. Timestamps are tracked cumulatively across utterances within a turn.
  • Description versions: When description is provided, the service uses Hume API version "1". Without a description, it uses the newer version "2".
  • Audio buffering: Audio is buffered internally until a minimum chunk size is reached before being pushed as frames, reducing audio glitches.