MiniMax

Overview

MiniMax’s T2A (Text-to-Audio) API provides high-quality text-to-speech synthesis with streaming capabilities, emotional voice control, and support for multiple languages. The service offers various models optimized for different use cases, from low-latency to high-definition audio quality.

API Reference

Complete API documentation and method details

MiniMax T2A Docs

Official MiniMax T2A API documentation

Example Code

Working example with emotional voice settings

Installation

To use MiniMax services, no additional dependencies are required beyond the base installation:

pip install "pipecat-ai"

You’ll need MiniMax API credentials:

MINIMAX_API_KEY
MINIMAX_GROUP_ID

Get your API credentials from the MiniMax Platform.

Frames

Input

TextFrame - Text content to synthesize into speech
TTSSpeakFrame - Text that should be spoken immediately
TTSUpdateSettingsFrame - Runtime configuration updates
LLMFullResponseStartFrame / LLMFullResponseEndFrame - LLM response boundaries

Output

TTSStartedFrame - Signals start of synthesis
TTSAudioRawFrame - Generated audio data chunks (streaming PCM)
TTSStoppedFrame - Signals completion of synthesis
ErrorFrame - API or processing errors

Model Comparison

Model	Quality	Latency	Features
speech-02-hd	Highest	Higher	Superior rhythm and stability
speech-02-turbo	High	Lower	Enhanced multilingual capabilities
speech-01-hd	High	Medium	Rich voices with expressive emotions
speech-01-turbo	Good	Lowest	Regular updates, fast response

Refer to the MiniMax documentation for up-to-date model information.

Voice Selection

MiniMax offers diverse voice personalities:

Voice ID	Description	Tone
`Wise_Woman`	Mature female voice	Authoritative, knowledgeable
`Friendly_Person`	Warm, approachable	Conversational, welcoming
`Patient_Man`	Calm male voice	Steady, reassuring
`Lively_Girl`	Young female voice	Energetic, enthusiastic
`Deep_Voice_Man`	Rich male voice	Professional, commanding
`Calm_Woman`	Serene female voice	Peaceful, soothing
`Elegant_Man`	Sophisticated male	Refined, articulate

See the MiniMax documentation for the complete list of available voices.

Supported Sample Rates

MiniMax supports multiple sample rates for different quality levels:

8000 Hz
16000 Hz
22050 Hz
24000 Hz
32000 Hz
44100 Hz

Language Support

View All Supported Languages

Language Code	Description	Service Code
`Language.AR`	Arabic	`Arabic`
`Language.CS`	Czech	`Czech`
`Language.DE`	German	`German`
`Language.EL`	Greek	`Greek`
`Language.EN`	English	`English`
`Language.ES`	Spanish	`Spanish`
`Language.FI`	Finnish	`Finnish`
`Language.FR`	French	`French`
`Language.HI`	Hindi	`Hindi`
`Language.ID`	Indonesian	`Indonesian`
`Language.IT`	Italian	`Italian`
`Language.JA`	Japanese	`Japanese`
`Language.KO`	Korean	`Korean`
`Language.NL`	Dutch	`Dutch`
`Language.PL`	Polish	`Polish`
`Language.PT`	Portuguese	`Portuguese`
`Language.RO`	Romanian	`Romanian`
`Language.RU`	Russian	`Russian`
`Language.TH`	Thai	`Thai`
`Language.TR`	Turkish	`Turkish`
`Language.UK`	Ukrainian	`Ukrainian`
`Language.VI`	Vietnamese	`Vietnamese`
`Language.YUE`	Chinese (Cantonese)	`Chinese,Yue`
`Language.ZH`	Chinese (Mandarin)	`Chinese`

Common languages supported include:

Language.EN - English
Language.ZH - Chinese (Mandarin)
Language.ES - Spanish
Language.FR - French
Language.DE - German
Language.JA - Japanese

Usage Example

Basic Configuration

Initialize the MiniMaxHttpTTSService and use it in a pipeline:

from pipecat.services.minimax.tts import MiniMaxHttpTTSService
from pipecat.transcriptions.language import Language
import os

# Configure service
tts = MiniMaxHttpTTSService(
    api_key=os.getenv("MINIMAX_API_KEY"),
    group_id=os.getenv("MINIMAX_GROUP_ID"),
    aiohttp_session=session,
    params=MiniMaxHttpTTSService.InputParams(
        language=Language.EN
    )
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

Dynamic Configuration

Make settings updates by pushing a TTSUpdateSettingsFrame for the MiniMaxHttpTTSService:

from pipecat.frames.frames import TTSUpdateSettingsFrame

await task.queue_frame(
    TTSUpdateSettingsFrame(settings={"voice": "your-new-voice-id"})
)

Metrics

The service provides comprehensive metrics:

Time to First Byte (TTFB) - Latency from text input to first audio
Processing Duration - Total synthesis time
Character Usage - Text processed for billing

Learn how to enable Metrics in your Pipeline.

Additional Notes

HTTP Session Required: Must provide an aiohttp.ClientSession for API communication
Emotional AI: Advanced emotional expression capabilities with voice-specific optimizations
Text Normalization: Optional English normalization for better number and abbreviation handling

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

API Reference

MiniMax T2A Docs

Example Code

Installation

Frames

Input

Output

Model Comparison

Voice Selection

Supported Sample Rates

Language Support

Usage Example

Basic Configuration

Dynamic Configuration

Metrics

Additional Notes

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

API Reference

MiniMax T2A Docs

Example Code

​Installation

​Frames

​Input

​Output

​Model Comparison

​Voice Selection

​Supported Sample Rates

​Language Support

​Usage Example

​Basic Configuration

​Dynamic Configuration

​Metrics

​Additional Notes

Overview

Installation

Frames

Input

Output

Model Comparison

Voice Selection

Supported Sample Rates

Language Support

Usage Example

Basic Configuration

Dynamic Configuration

Metrics

Additional Notes