Overview
Mistral provides high-quality text-to-speech synthesis through the Voxtral TTS API. The service uses HTTP streaming with Server-Sent Events to deliver PCM-encoded audio at 24kHz, with automatic resampling to any requested sample rate.Mistral TTS API Reference
Pipecat’s API methods for Mistral TTS integration
Example Implementation
Complete example with voice conversation
Mistral Documentation
Official Mistral API documentation and features
Installation
To use Mistral TTS, install the required dependencies:Prerequisites
Mistral Account Setup
Before using Mistral TTS services, you need:- Mistral Account: Sign up at Mistral
- API Key: Generate an API key from your account dashboard
- Voice Selection: Choose voice IDs from the Mistral voice library
Required Environment Variables
MISTRAL_API_KEY: Your Mistral API key for authentication
Configuration
Mistral API key for authentication. If
None, uses the MISTRAL_API_KEY environment variable.Output audio sample rate in Hz. When
None, uses the pipeline’s configured sample rate. Audio is automatically resampled from Mistral’s native 24kHz.Settings
Runtime-configurable settings passed via thesettings constructor argument using MistralTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | voxtral-mini-tts-2603 | TTS model identifier. |
voice | str | None | Voice identifier for synthesis. |
language | Language | str | None | Language for speech synthesis. (Inherited from base settings.) |
Usage
Basic Setup
With Custom Model
With Custom Sample Rate
Notes
- Streaming: The service uses Server-Sent Events for real-time audio streaming.
- Resampling: Audio is automatically resampled from Mistral’s native 24kHz to any requested sample rate.
- Audio format: The service receives float32 PCM audio from the API and converts it to int16 PCM for the Pipecat pipeline.
- Metrics: The service supports metrics generation for monitoring TTS performance.