Overview

MiniMaxHttpTTSService provides text-to-speech capabilities using MiniMax’s T2A (Text-to-Audio) API. It supports multiple voices, emotions, languages, and speech customization options.

Installation

To use MiniMaxHttpTTSService, no additional dependencies are required.

You’ll also need MiniMax API credentials (API key and Group ID).

Configuration

Constructor Parameters

api_key
str
required

MiniMax API key for authentication

group_id
str
required

MiniMax Group ID to identify your project

model
str
default:"speech-02-turbo"

MiniMax TTS model to use. Available options include:

  • speech-02-hd: HD model with superior rhythm and stability
  • speech-02-turbo: Turbo model with enhanced multilingual capabilities
  • speech-01-hd: Rich voices with expressive emotions
  • speech-01-turbo: Low-latency model with regular updates
voice_id
str
default:"Calm_Woman"

MiniMax voice identifier. Options include:

  • Wise_Woman
  • Friendly_Person
  • Inspirational_girl
  • Deep_Voice_Man
  • Calm_Woman
  • Casual_Guy
  • Lively_Girl
  • Patient_Man
  • Young_Knight
  • Determined_Man
  • Lovely_Girl
  • Decent_Boy
  • Imposing_Manner
  • Elegant_Man
  • Abbess
  • Sweet_Girl_2
  • Exuberant_Girl

See the MiniMax documentation for a complete list of available voices.

aiohttp_session
aiohttp.ClientSession
required

Aiohttp session for API communication

sample_rate
int
default:"None"

Output audio sample rate in Hz

params
InputParams

TTS configuration parameters

Input Parameters

language
Language
default:"Language.EN"

Language for TTS generation

speed
float
default:"1.0"

Speech speed (range: 0.5 to 2.0). Values greater than 1.0 increase speed, less than 1.0 decrease speed.

volume
float
default:"1.0"

Speech volume (range: 0 to 10). Values greater than 1.0 increase volume.

pitch
float
default:"0"

Pitch adjustment (range: -12 to 12). Positive values raise pitch, negative values lower pitch.

emotion
str

Emotional tone of the speech. Options include: “happy”, “sad”, “angry”, “fearful”, “disgusted”, “surprised”, and “neutral”.

english_normalization
bool

Whether to apply English text normalization, which improves performance in number-reading scenarios at the cost of slightly increased latency.

Output Frames

Control Frames

TTSStartedFrame
Frame

Signals start of speech synthesis

TTSStoppedFrame
Frame

Signals completion of speech synthesis

Audio Frames

TTSAudioRawFrame
Frame

Contains generated audio data with:

  • PCM audio format
  • Sample rate as specified
  • Single channel (mono)

Error Frames

ErrorFrame
Frame

Contains MiniMax API error information

Methods

See the TTS base class methods for additional functionality.

Language Support

Supports a wide range of languages through the language_boost parameter:

Language CodeService CodeDescription
Language.ARArabicArabic
Language.CSCzechCzech
Language.DEGermanGerman
Language.ELGreekGreek
Language.ENEnglishEnglish
Language.ESSpanishSpanish
Language.FIFinnishFinnish
Language.FRFrenchFrench
Language.HIHindiHindi
Language.IDIndonesianIndonesian
Language.ITItalianItalian
Language.JAJapaneseJapanese
Language.KOKoreanKorean
Language.NLDutchDutch
Language.PLPolishPolish
Language.PTPortuguesePortuguese
Language.RORomanianRomanian
Language.RURussianRussian
Language.THThaiThai
Language.TRTurkishTurkish
Language.UKUkrainianUkrainian
Language.VIVietnameseVietnamese
Language.YUEChinese,YueChinese (Cantonese)
Language.ZHChineseChinese (Mandarin)

Usage Example

import aiohttp
import os
from pipecat.services.minimax.tts import MiniMaxHttpTTSService
from pipecat.transcriptions.language import Language

async def create_tts_service():
    # Create an HTTP session
    session = aiohttp.ClientSession()

    # Configure service with credentials
    tts = MiniMaxHttpTTSService(
        api_key=os.getenv("MINIMAX_API_KEY"),
        group_id=os.getenv("MINIMAX_GROUP_ID"),
        model="speech-02-turbo",
        voice_id="Patient_Man",
        aiohttp_session=session,
        params=MiniMaxHttpTTSService.InputParams(
            language=Language.EN,
            speed=1.1,         # Slightly faster speech
            volume=1.2,        # Slightly louder
            pitch=0,           # Default pitch
            emotion="neutral"  # Neutral emotional tone
        )
    )

    return tts

# Use in pipeline
pipeline = Pipeline([
    ...,
    llm,
    tts,
    transport.output(),
])

Frame Flow

Metrics Support

The service collects processing metrics:

  • Time to First Byte (TTFB)
  • Processing duration
  • Character usage

Notes

  • Uses streaming audio generation for faster initial response
  • Processes audio in chunks for efficient memory usage
  • Supports real-time applications with low latency
  • Automatically handles API authentication
  • Provides PCM audio compatible with most audio pipelines