> ## Documentation Index > Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt > Use this file to discover all available pages before exploring further. # Cartesia > Text-to-speech services using Cartesia's WebSocket and HTTP APIs ## Overview Cartesia provides high-quality text-to-speech synthesis with two service implementations: `CartesiaTTSService` (WebSocket-based) for real-time streaming with word timestamps, and `CartesiaHttpTTSService` (HTTP-based) for simpler batch synthesis. `CartesiaTTSService` is recommended for interactive applications requiring low latency and interruption handling. Pipecat's API methods for Cartesia TTS integration Complete example with interruption handling Official Cartesia API documentation and features Browse and test available voices ## Installation To use Cartesia services, install the required dependencies: ```bash theme={null} uv add "pipecat-ai[cartesia]" ``` ## Prerequisites ### Cartesia Account Setup Before using Cartesia TTS services, you need: 1. **Cartesia Account**: Sign up at [Cartesia](https://play.cartesia.ai/sign-up) 2. **API Key**: Generate an API key from your account dashboard 3. **Voice Selection**: Choose voice IDs from the [voice library](https://play.cartesia.ai/) ### Required Environment Variables * `CARTESIA_API_KEY`: Your Cartesia API key for authentication ## Configuration ### CartesiaTTSService Cartesia API key for authentication. ID of the voice to use for synthesis. *Deprecated in v0.0.105. Use `settings=CartesiaTTSService.Settings(voice=...)` instead.* TTS model to use. *Deprecated in v0.0.105. Use `settings=CartesiaTTSService.Settings(model=...)` instead.* API version string for Cartesia service. WebSocket endpoint URL. Output audio sample rate in Hz. When `None`, uses the pipeline's configured sample rate. Audio encoding format. Audio container format. Server-side buffering window (in milliseconds) before generation starts. `0` disables server buffering (custom buffering); values in (0, 5000] enable managed buffering. When `None`, automatically derived from `text_aggregation_mode`: `0` for `SENTENCE` mode (avoids stacking client and server buffering), unset for `TOKEN` mode (uses Cartesia's 3000ms default). Controls how incoming text is aggregated before synthesis. `SENTENCE` (default) buffers text until sentence boundaries, producing more natural speech. `TOKEN` streams tokens directly for lower latency. Import from `pipecat.services.tts_service`. *Deprecated in v0.0.104.* Use `text_aggregation_mode` instead. *Deprecated in v0.0.105. Use `settings=CartesiaTTSService.Settings(...)` instead.* Runtime-configurable settings. See [Settings](#settings) below. ### CartesiaHttpTTSService The HTTP service accepts similar parameters to the WebSocket service, with these differences: HTTP API base URL (instead of `url` for WebSocket). API version for HTTP service. Optional aiohttp ClientSession for HTTP requests. If not provided, a session will be created and managed internally. The HTTP service does not accept `text_aggregation_mode` or `aggregate_sentences`. ### Settings Runtime-configurable settings passed via the `settings` constructor argument using `CartesiaTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details. | Parameter | Type | Default | Description | | ----------------------- | ------------------ | ----------- | --------------------------------------------------------------------------------------------------- | | `model` | `str` | `None` | TTS model identifier. Defaults to `sonic-3.5` when not specified. *(Inherited from base settings.)* | | `voice` | `str` | `None` | Voice identifier. *(Inherited from base settings.)* | | `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited from base settings.)* | | `generation_config` | `GenerationConfig` | `NOT_GIVEN` | Generation configuration for Cartesia models. See below. | | `pronunciation_dict_id` | `str` | `NOT_GIVEN` | ID of the pronunciation dictionary for custom pronunciations. | #### GenerationConfig Configuration for Cartesia generation parameters. Applicable to sonic-3 and sonic-3.5 models: | Parameter | Type | Default | Description | | --------- | ------- | ------- | ----------------------------------------------------------------------------------------------------- | | `volume` | `float` | `None` | Volume multiplier. Valid range: \[0.5, 2.0]. | | `speed` | `float` | `None` | Speed multiplier. Valid range: \[0.6, 1.5]. | | `emotion` | `str` | `None` | Emotion string to guide tone (e.g., `"neutral"`, `"angry"`, `"excited"`). Over 60 emotions supported. | ## Usage ### Basic Setup ```python theme={null} from pipecat.services.cartesia import CartesiaTTSService tts = CartesiaTTSService( api_key=os.getenv("CARTESIA_API_KEY"), settings=CartesiaTTSService.Settings( voice="your-voice-id", ), ) ``` ### With Generation Config ```python theme={null} from pipecat.services.cartesia.tts import GenerationConfig tts = CartesiaTTSService( api_key=os.getenv("CARTESIA_API_KEY"), settings=CartesiaTTSService.Settings( voice="your-voice-id", model="sonic-3", generation_config=GenerationConfig( speed=1.1, emotion="excited", ), ), ) ``` ### HTTP Service ```python theme={null} from pipecat.services.cartesia import CartesiaHttpTTSService tts = CartesiaHttpTTSService( api_key=os.getenv("CARTESIA_API_KEY"), settings=CartesiaHttpTTSService.Settings( voice="your-voice-id", ), ) ``` ## Customizing Speech These helper methods use SSML tags and generation config parameters introduced for `sonic-3`. `sonic-3.5` has significantly improved natural expressiveness, so most users will get better results by relying on the model and the context of their input text rather than manually tuning emotion, speed, or volume. These controls remain available on the API, but effectiveness may vary on `sonic-3.5`. `CartesiaTTSService` provides a set of helper methods for implementing Cartesia-specific customizations, meant to be used as part of text transformers. These include methods for spelling out text, adjusting speech rate, and modifying pitch. See the [Text Transformers for TTS](/pipecat/learn/text-to-speech#text-transforms) section in the Text-to-Speech guide for usage examples. ### SPELL(text: str) -> str: A convenience method to wrap text in [Cartesia's spell tag](https://docs.cartesia.ai/build-with-cartesia/sonic-3/ssml-tags#spelling-out-numbers-and-letters) for spelling out text character by character. ```python theme={null} # Text transformers for TTS # This will insert Cartesia's spell tags around the provided text. async def spell_out_text(text: str, type: str) -> str: return CartesiaTTSService.SPELL(text) tts = CartesiaTTSService( api_key=os.getenv("CARTESIA_API_KEY"), text_transforms=[ ("phone_number", spell_out_text), ], ) ``` ### EMOTION\_TAG(emotion: CartesiaEmotion) -> str: A convenience method to create an [emotion tag](https://docs.cartesia.ai/build-with-cartesia/sonic-3/volume-speed-emotion#emotion-controls-beta) for expressing emotions in speech. ```python theme={null} # Text transformers for TTS # This will insert Cartesia's sarcasm tag in front of any sentence that is just "whatever". async def maybe_insert_sarcasm(text: str, type: str) -> str: if text.strip(".!").lower() == "whatever": return CartesiaTTSService.EMOTION_TAG(CartesiaEmotion.SARCASM) + text + CartesiaTTSService.EMOTION_TAG(CartesiaEmotion.NEUTRAL) return text tts = CartesiaTTSService( api_key=os.getenv("CARTESIA_API_KEY"), text_transforms=[ ("sentence", maybe_insert_sarcasm), ], ) ``` ### PAUSE\_TAG(seconds: float) -> str: A convenience method to create Cartesia's [SSML tag for inserting pauses](https://docs.cartesia.ai/build-with-cartesia/sonic-3/ssml-tags#pauses-and-breaks) in speech. ```python theme={null} # Text transformers for TTS # This will insert a one second pause after questions. async def pause_after_questions(text: str, type: str) -> str: if text.endswith("?"): return f"{text}{CartesiaTTSService.PAUSE_TAG(1.0)}" return text tts = CartesiaTTSService( api_key=os.getenv("CARTESIA_API_KEY"), text_transforms=[ ("sentence", pause_after_questions), # Only apply to sentence aggregations ], ) ``` ### VOLUME\_TAG(volume: float) -> str: A convenience method to create Cartesia's [SSML volume tag](https://docs.cartesia.ai/build-with-cartesia/sonic-3/ssml-tags#volume) for dynamically adjusting speech volume in situ. ```python theme={null} # Text transformers for TTS # This will increase the volume for any full text aggregation that is in all caps. async def maybe_say_it_loud(text: str, type: str) -> str: if text.upper() == text: return f"{CartesiaTTSService.VOLUME_TAG(2.0)}{text}{CartesiaTTSService.VOLUME_TAG(1.0)}" return text tts = CartesiaTTSService( api_key=os.getenv("CARTESIA_API_KEY"), text_transforms=[ ("*", maybe_say_it_loud), # Apply to all text ], ) ``` ### SPEED\_TAG(speed: float) -> str: A convenience method to create Cartesia's [SSML speed tag](https://docs.cartesia.ai/build-with-cartesia/sonic-3/ssml-tags#speed) for dynamically adjusting the speech rate in situ. ```python theme={null} # Text transformers for TTS # This will make the word slow always be spoken more slowly. async def slow_down_slow_words(text: str, type: str) -> str: return text.replace( "slow", f"{CartesiaTTSService.SPEED_TAG(0.6)}slow{CartesiaTTSService.SPEED_TAG(1.0)}" ) tts = CartesiaTTSService( api_key=os.getenv("CARTESIA_API_KEY"), text_transforms=[ ("*", slow_down_slow_words), # Apply to all text ], ) ``` The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use `Settings` / `settings=` instead. See the [Service Settings guide](/pipecat/fundamentals/service-settings) for migration details. ## Notes * **WebSocket vs HTTP**: The WebSocket service supports word-level timestamps, audio context management, and interruption handling, making it better for interactive conversations. The HTTP service is simpler but lacks these features. * **Text aggregation**: Sentence aggregation is enabled by default (`text_aggregation_mode=TextAggregationMode.SENTENCE`). Buffering until sentence boundaries produces more natural speech. Set `text_aggregation_mode=TextAggregationMode.TOKEN` to stream tokens directly for lower latency. Cartesia handles token streaming well. * **Connection timeout**: Cartesia WebSocket connections time out after 5 minutes of inactivity (no keepalive mechanism is available). The service automatically reconnects when needed. ## Event Handlers Cartesia TTS supports the standard [service connection events](/api-reference/server/events/service-events): | Event | Description | | --------------------- | ------------------------------------ | | `on_connected` | Connected to Cartesia WebSocket | | `on_disconnected` | Disconnected from Cartesia WebSocket | | `on_connection_error` | WebSocket connection error occurred | ```python theme={null} @tts.event_handler("on_connected") async def on_connected(service): print("Connected to Cartesia") ```