Text-to-speech services using Cartesia’s WebSocket and HTTP APIs
CartesiaTTSService
: WebSocket-based with streaming and word timestampsCartesiaHttpTTSService
: HTTP-based for simpler synthesisCartesiaTTSService
is recommended for real-time applications.CARTESIA_API_KEY
.
TextFrame
- Text content to synthesize into speechTTSSpeakFrame
- Text that the TTS service should speakTTSUpdateSettingsFrame
- Runtime configuration updates (e.g., voice)LLMFullResponseStartFrame
/ LLMFullResponseEndFrame
- LLM response boundariesTTSStartedFrame
- Signals start of synthesisTTSAudioRawFrame
- Generated audio data chunksTTSStoppedFrame
- Signals completion of synthesisErrorFrame
- Connection or processing errorsFeature | CartesiaTTSService (WebSocket) | CartesiaHttpTTSService (HTTP) |
---|---|---|
Streaming | ✅ Real-time chunks | ❌ Single audio block |
Word Timestamps | ✅ Precise timing | ❌ Not available |
Interruption | ✅ Advanced handling | ⚠️ Basic support |
Latency | 🚀 Low | 📈 Higher |
Best For | Interactive apps | Batch processing |
Language
enum:
Language Code | Description | Service Code |
---|---|---|
Language.DE | German | de |
Language.EN | English | en |
Language.ES | Spanish | es |
Language.FR | French | fr |
Language.HI | Hindi | hi |
Language.IT | Italian | it |
Language.JA | Japanese | ja |
Language.KO | Korean | ko |
Language.NL | Dutch | nl |
Language.PL | Polish | pl |
Language.PT | Portuguese | pt |
Language.RU | Russian | ru |
Language.SV | Swedish | sv |
Language.TR | Turkish | tr |
Language.ZH | Chinese (Mandarin) | zh |
TTSUpdateSettingsFrame
for the CartesiaTTSService
:
CartesiaTTSService
for low-latency streaming and accurate context updates with word timestampsPipelineParams
rather than per-service for consistency