MiniMax
Text-to-speech service implementation using MiniMax T2A API
Overview
MiniMaxHttpTTSService
provides text-to-speech capabilities using MiniMax’s T2A (Text-to-Audio) API. It supports multiple voices, emotions, languages, and speech customization options.
Installation
To use MiniMaxHttpTTSService
, no additional dependencies are required.
You’ll also need MiniMax API credentials (API key and Group ID).
Configuration
Constructor Parameters
MiniMax API key for authentication
MiniMax Group ID to identify your project
MiniMax TTS model to use. Available options include:
speech-02-hd
: HD model with superior rhythm and stabilityspeech-02-turbo
: Turbo model with enhanced multilingual capabilitiesspeech-01-hd
: Rich voices with expressive emotionsspeech-01-turbo
: Low-latency model with regular updates
MiniMax voice identifier. Options include:
Wise_Woman
Friendly_Person
Inspirational_girl
Deep_Voice_Man
Calm_Woman
Casual_Guy
Lively_Girl
Patient_Man
Young_Knight
Determined_Man
Lovely_Girl
Decent_Boy
Imposing_Manner
Elegant_Man
Abbess
Sweet_Girl_2
Exuberant_Girl
See the MiniMax documentation for a complete list of available voices.
Aiohttp session for API communication
Output audio sample rate in Hz
TTS configuration parameters
Input Parameters
Language for TTS generation
Speech speed (range: 0.5 to 2.0). Values greater than 1.0 increase speed, less than 1.0 decrease speed.
Speech volume (range: 0 to 10). Values greater than 1.0 increase volume.
Pitch adjustment (range: -12 to 12). Positive values raise pitch, negative values lower pitch.
Emotional tone of the speech. Options include: “happy”, “sad”, “angry”, “fearful”, “disgusted”, “surprised”, and “neutral”.
Whether to apply English text normalization, which improves performance in number-reading scenarios at the cost of slightly increased latency.
Output Frames
Control Frames
Signals start of speech synthesis
Signals completion of speech synthesis
Audio Frames
Contains generated audio data with:
- PCM audio format
- Sample rate as specified
- Single channel (mono)
Error Frames
Contains MiniMax API error information
Methods
See the TTS base class methods for additional functionality.
Language Support
Supports a wide range of languages through the language_boost
parameter:
Language Code | Service Code | Description |
---|---|---|
Language.AR | Arabic | Arabic |
Language.CS | Czech | Czech |
Language.DE | German | German |
Language.EL | Greek | Greek |
Language.EN | English | English |
Language.ES | Spanish | Spanish |
Language.FI | Finnish | Finnish |
Language.FR | French | French |
Language.HI | Hindi | Hindi |
Language.ID | Indonesian | Indonesian |
Language.IT | Italian | Italian |
Language.JA | Japanese | Japanese |
Language.KO | Korean | Korean |
Language.NL | Dutch | Dutch |
Language.PL | Polish | Polish |
Language.PT | Portuguese | Portuguese |
Language.RO | Romanian | Romanian |
Language.RU | Russian | Russian |
Language.TH | Thai | Thai |
Language.TR | Turkish | Turkish |
Language.UK | Ukrainian | Ukrainian |
Language.VI | Vietnamese | Vietnamese |
Language.YUE | Chinese,Yue | Chinese (Cantonese) |
Language.ZH | Chinese | Chinese (Mandarin) |
Usage Example
Frame Flow
Metrics Support
The service collects processing metrics:
- Time to First Byte (TTFB)
- Processing duration
- Character usage
Notes
- Uses streaming audio generation for faster initial response
- Processes audio in chunks for efficient memory usage
- Supports real-time applications with low latency
- Automatically handles API authentication
- Provides PCM audio compatible with most audio pipelines