> ## Documentation Index > Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt > Use this file to discover all available pages before exploring further. # xAI > Text-to-speech services using xAI's HTTP and WebSocket streaming APIs with support for 20 languages ## Overview xAI provides two text-to-speech services: * **XAIHttpTTSService**: Batch synthesis via HTTP API. Sends complete text and receives the full audio response. * **XAITTSService**: Streaming synthesis via WebSocket. Streams text incrementally and receives audio chunks as they're synthesized, reducing latency. Supports word-level timestamps for accurate timing of synthesized speech. Both support multiple languages and audio encoding formats. Complete API reference for all parameters and methods Streaming WebSocket example with interruption handling Batch HTTP example Official xAI voice API documentation ## Installation ```bash theme={null} uv add "pipecat-ai[xai]" ``` ## Prerequisites 1. **xAI Account**: Sign up at [xAI](https://x.ai/) 2. **API Key**: Generate an API key from your account dashboard (also works with Grok API keys) Set the following environment variable: ```bash theme={null} export GROK_API_KEY=your_api_key ``` ## Configuration ### XAIHttpTTSService xAI API key for authentication. xAI TTS endpoint URL. Override for custom or proxied deployments. Output audio sample rate in Hz. When `None`, uses the pipeline's configured sample rate. Output audio encoding format. Supported formats: `"pcm"`, `"mp3"`, `"wav"`, `"mulaw"`, `"alaw"`. Optional shared aiohttp session for HTTP requests. If `None`, the service creates and manages its own session. Runtime-configurable settings. See [Settings](#settings) below. ### Settings Runtime-configurable settings passed via the `settings` constructor argument using `XAIHttpTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details. | Parameter | Type | Default | Description | | ---------------------------- | ----------------- | ------------- | -------------------------------------------------------- | | `model` | `str` | `None` | Model identifier. *(Inherited from base settings.)* | | `voice` | `str` | `"eve"` | Voice identifier. *(Inherited from base settings.)* | | `language` | `Language \| str` | `Language.EN` | Language code. *(Inherited from base settings.)* | | `speed` | `float` | `None` | Speech speed multiplier from 0.7 to 1.5 (1.0 is normal). | | `optimize_streaming_latency` | `int` | `None` | Latency optimization level (0, 1, or 2). | | `text_normalization` | `bool` | `None` | Whether to normalize text before synthesis. | ### XAITTSService xAI API key for authentication. xAI TTS WebSocket endpoint URL. Override for custom or proxied deployments. Output audio sample rate in Hz. When `None`, uses the pipeline's configured sample rate. Output audio codec. Supported codecs: `"pcm"`, `"wav"`, `"mulaw"`, `"alaw"`. Defaults to `"pcm"` so emitted `TTSAudioRawFrame` objects need no decoding downstream. Runtime-configurable settings. Includes all settings from `XAIHttpTTSService` plus `with_timestamps` for word-level timing. Changing voice, language, or tunable parameters at runtime reconnects the WebSocket with new query parameters. ### WebSocket Settings Runtime-configurable settings for `XAITTSService` using `XAITTSService.Settings(...)`. Includes all HTTP service settings plus: | Parameter | Type | Default | Description | | ----------------- | ------ | ------- | ------------------------------------------------------------------------------------------------------------------- | | `with_timestamps` | `bool` | `True` | Whether to request character timings. When enabled, the service converts them into per-word `TTSTextFrame` objects. | ## Supported Languages xAI TTS supports 20 languages. Use the `Language` enum from `pipecat.transcriptions.language`: * Arabic (Egyptian, Saudi, UAE): `Language.AR`, `Language.AR_EG`, `Language.AR_SA`, `Language.AR_AE` * Bengali: `Language.BN` * Chinese: `Language.ZH` * English: `Language.EN` * French: `Language.FR` * German: `Language.DE` * Hindi: `Language.HI` * Indonesian: `Language.ID` * Italian: `Language.IT` * Japanese: `Language.JA` * Korean: `Language.KO` * Portuguese (Brazil, Portugal): `Language.PT`, `Language.PT_BR`, `Language.PT_PT` * Russian: `Language.RU` * Spanish (Spain, Mexico): `Language.ES`, `Language.ES_ES`, `Language.ES_MX` * Turkish: `Language.TR` * Vietnamese: `Language.VI` ## Usage ### WebSocket Streaming (XAITTSService) #### Basic Setup ```python theme={null} import os from pipecat.services.xai.tts import XAITTSService tts = XAITTSService( api_key=os.getenv("GROK_API_KEY"), settings=XAITTSService.Settings( voice="eve", ), ) ``` #### With Custom Language ```python theme={null} from pipecat.transcriptions.language import Language tts = XAITTSService( api_key=os.getenv("GROK_API_KEY"), settings=XAITTSService.Settings( voice="eve", language=Language.ES, ), ) ``` #### With Custom Sample Rate and Codec ```python theme={null} tts = XAITTSService( api_key=os.getenv("GROK_API_KEY"), sample_rate=24000, codec="wav", settings=XAITTSService.Settings( voice="eve", ), ) ``` #### With Tunable Parameters ```python theme={null} tts = XAITTSService( api_key=os.getenv("GROK_API_KEY"), settings=XAITTSService.Settings( voice="eve", speed=1.2, # Faster speech optimize_streaming_latency=2, # Maximum latency optimization text_normalization=True, # Enable text normalization with_timestamps=True, # Enable word timestamps (default) ), ) ``` ### HTTP Batch (XAIHttpTTSService) #### Basic Setup ```python theme={null} import os from pipecat.services.xai.tts import XAIHttpTTSService tts = XAIHttpTTSService( api_key=os.getenv("GROK_API_KEY"), settings=XAIHttpTTSService.Settings( voice="eve", ), ) ``` #### With Custom Encoding ```python theme={null} tts = XAIHttpTTSService( api_key=os.getenv("GROK_API_KEY"), encoding="mp3", settings=XAIHttpTTSService.Settings( voice="eve", ), ) ``` #### With Shared HTTP Session ```python theme={null} import aiohttp async with aiohttp.ClientSession() as session: tts = XAIHttpTTSService( api_key=os.getenv("GROK_API_KEY"), aiohttp_session=session, settings=XAIHttpTTSService.Settings( voice="eve", ), ) ``` ### Updating Settings at Runtime Voice settings can be changed mid-conversation using `TTSUpdateSettingsFrame`. This works for both services: ```python theme={null} from pipecat.frames.frames import TTSUpdateSettingsFrame from pipecat.services.xai.tts import XAITTSSettings from pipecat.transcriptions.language import Language await worker.queue_frame( TTSUpdateSettingsFrame( delta=XAITTSSettings( language=Language.FR, ) ) ) ``` Note: For `XAITTSService`, changing voice or language settings reconnects the WebSocket with updated query parameters. ## Notes * **Service choice**: * Use `XAITTSService` (WebSocket) for lower latency streaming synthesis where audio begins playing before the full utterance finishes. * Use `XAIHttpTTSService` (HTTP) for simpler batch synthesis or when WebSocket connections are not available. * **Default audio format**: Both services default to raw PCM output, which matches Pipecat's downstream expectations without extra decoding. * **Encoding/codec options**: When using non-PCM formats (`mp3`, `wav`, `mulaw`, `alaw`), ensure your audio pipeline can handle the selected format. * **Session management**: * `XAIHttpTTSService`: If you don't provide an `aiohttp_session`, the service creates and manages its own session lifecycle automatically. * `XAITTSService`: WebSocket connection is managed automatically; settings changes that affect URL parameters (voice, language, or tunable settings) trigger a reconnection. * **Interruption handling**: `XAITTSService` handles barge-in by sending a `text.clear` message over the existing WebSocket connection, avoiding the overhead of reconnecting on every interruption. * **Word timestamps**: When `with_timestamps` is enabled (the default), xAI's per-character timings are converted into per-word `TTSTextFrame` objects with accurate `pts` values. Note that xAI delivers timestamps in batches that are decoupled from the audio stream (a batch can cover several seconds of speech), so word frames are emitted in bursts. Consumers should schedule off `pts` rather than arrival time.