> ## Documentation Index > Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt > Use this file to discover all available pages before exploring further. # Azure > Text-to-speech service using Azure Cognitive Services Speech SDK ## Overview Azure Cognitive Services provides high-quality text-to-speech synthesis with two service implementations: `AzureTTSService` (WebSocket-based) for real-time streaming with low latency, and `AzureHttpTTSService` (HTTP-based) for batch synthesis. `AzureTTSService` is recommended for interactive applications requiring streaming capabilities. Pipecat's API methods for Azure TTS integration Complete example with streaming synthesis Official Azure Speech Services documentation Browse available voices and languages ## Installation To use Azure services, install the required dependencies: ```bash theme={null} uv add "pipecat-ai[azure]" ``` ## Prerequisites ### Azure Account Setup Before using Azure TTS services, you need: 1. **Azure Account**: Sign up at [Azure Portal](https://portal.azure.com/) 2. **Speech Service**: Create a Speech resource in your Azure subscription 3. **API Key and Region**: Get your subscription key and service region 4. **Voice Selection**: Choose from available voices in the [Voice Gallery](https://speech.microsoft.com/portal/voicegallery) ### Required Environment Variables * `AZURE_SPEECH_API_KEY`: Your Azure Speech service API key * `AZURE_SPEECH_REGION`: Your Azure Speech service region (e.g., "eastus") *or* a custom endpoint URL when using Private Link ## Configuration ### AzureTTSService Azure Cognitive Services subscription key. Azure region identifier (e.g., `"eastus"`, `"westus2"`). Required unless `private_endpoint` is provided. Custom endpoint URL for Azure Speech Services (e.g., `"https://my-resource.cognitiveservices.azure.com/"`). Use this when connecting via Private Link or a custom domain. See [Azure Private Link documentation](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-services-private-link). Voice name to use for synthesis. *Deprecated in v0.0.105. Use `settings=AzureTTSService.Settings(voice=...)` instead.* Output audio sample rate in Hz. When `None`, uses the pipeline's configured sample rate. Controls how incoming text is aggregated before synthesis. `SENTENCE` (default) buffers text until sentence boundaries, producing more natural speech. `TOKEN` streams tokens directly for lower latency. Import from `pipecat.services.tts_service`. *Deprecated in v0.0.104.* Use `text_aggregation_mode` instead. *Deprecated in v0.0.105. Use `settings=AzureTTSService.Settings(...)` instead.* Runtime-configurable settings. See [Settings](#settings) below. ### AzureHttpTTSService The HTTP service accepts the same parameters as the streaming service except `text_aggregation_mode` and `aggregate_sentences`: Azure Cognitive Services subscription key. Azure region identifier. Required unless `private_endpoint` is provided. Custom endpoint URL for Azure Speech Services. Use this when connecting via Private Link or a custom domain. See [Azure Private Link documentation](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-services-private-link). Voice name to use for synthesis. *Deprecated in v0.0.105. Use `settings=AzureHttpTTSService.Settings(voice=...)` instead.* Output audio sample rate in Hz. *Deprecated in v0.0.105. Use `settings=AzureHttpTTSService.Settings(...)` instead.* Runtime-configurable settings. See [Settings](#settings) below. ### Settings Runtime-configurable settings passed via the `settings` constructor argument using `AzureTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details. | Parameter | Type | Default | Description | | -------------- | ----------------- | ----------- | -------------------------------------- | | `model` | `str` | `None` | Model identifier. *(Inherited.)* | | `voice` | `str` | `None` | Voice identifier. *(Inherited.)* | | `language` | `Language \| str` | `None` | Language for synthesis. *(Inherited.)* | | `emphasis` | `str` | `NOT_GIVEN` | Emphasis level for SSML. | | `pitch` | `str` | `NOT_GIVEN` | Pitch adjustment. | | `rate` | `str` | `NOT_GIVEN` | Speaking rate. | | `role` | `str` | `NOT_GIVEN` | Role for SSML. | | `style` | `str` | `NOT_GIVEN` | Speaking style. | | `style_degree` | `str` | `NOT_GIVEN` | Degree of the speaking style. | | `volume` | `str` | `NOT_GIVEN` | Volume level. | ## Usage ### Basic Setup ```python theme={null} from pipecat.services.azure import AzureTTSService tts = AzureTTSService( api_key=os.getenv("AZURE_SPEECH_API_KEY"), region=os.getenv("AZURE_SPEECH_REGION"), settings=AzureTTSService.Settings( voice="en-US-SaraNeural", ), ) ``` ### With Voice Customization ```python theme={null} from pipecat.transcriptions.language import Language tts = AzureTTSService( api_key=os.getenv("AZURE_SPEECH_API_KEY"), region="eastus", settings=AzureTTSService.Settings( voice="en-US-JennyMultilingualNeural", language=Language.EN_US, style="cheerful", style_degree="1.5", rate="1.1", ), ) ``` ### HTTP Service ```python theme={null} from pipecat.services.azure import AzureHttpTTSService tts = AzureHttpTTSService( api_key=os.getenv("AZURE_SPEECH_API_KEY"), region=os.getenv("AZURE_SPEECH_REGION"), voice="en-US-SaraNeural", ) ``` ### With Private Endpoint ```python theme={null} from pipecat.services.azure import AzureTTSService tts = AzureTTSService( api_key=os.getenv("AZURE_SPEECH_API_KEY"), private_endpoint="https://my-resource.cognitiveservices.azure.com/", settings=AzureTTSService.Settings( voice="en-US-SaraNeural", ), ) ``` The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use `Settings` / `settings=` instead. See the [Service Settings guide](/pipecat/fundamentals/service-settings) for migration details. ## Notes * **Streaming vs HTTP**: The streaming service (`AzureTTSService`) provides word-level timestamps and lower latency, making it better for interactive conversations. The HTTP service is simpler but returns the complete audio at once. * **SSML support**: Both services automatically construct SSML from the `Settings`. Special characters in text are automatically escaped. * **Word timestamps**: `AzureTTSService` supports word-level timestamps for synchronized text display. CJK languages receive special handling to merge individual characters into meaningful word units. * **8kHz workaround**: At 8kHz sample rates, Azure's reported audio duration may not match word boundary offsets. The service uses word boundary offsets for timing in this case.