> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pipecat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Azure

> Text-to-speech service using Azure Cognitive Services Speech SDK

## Overview

Azure Cognitive Services provides high-quality text-to-speech synthesis with two service implementations: `AzureTTSService` (WebSocket-based) for real-time streaming with low latency, and `AzureHttpTTSService` (HTTP-based) for batch synthesis. `AzureTTSService` is recommended for interactive applications requiring streaming capabilities.

<CardGroup cols={2}>
  <Card title="Azure TTS API Reference" icon="code" href="https://reference-server.pipecat.ai/en/latest/api/pipecat.services.azure.tts.html">
    Pipecat's API methods for Azure TTS integration
  </Card>

  <Card title="Example Implementation" icon="play" href="https://github.com/pipecat-ai/pipecat/blob/main/examples/voice/voice-azure.py">
    Complete example with streaming synthesis
  </Card>

  <Card title="Azure Speech Documentation" icon="book" href="https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/">
    Official Azure Speech Services documentation
  </Card>

  <Card title="Voice Gallery" icon="microphone" href="https://speech.microsoft.com/portal/voicegallery">
    Browse available voices and languages
  </Card>
</CardGroup>

## Installation

To use Azure services, install the required dependencies:

```bash theme={null}
uv add "pipecat-ai[azure]"
```

## Prerequisites

### Azure Account Setup

Before using Azure TTS services, you need:

1. **Azure Account**: Sign up at [Azure Portal](https://portal.azure.com/)
2. **Speech Service**: Create a Speech resource in your Azure subscription
3. **API Key and Region**: Get your subscription key and service region
4. **Voice Selection**: Choose from available voices in the [Voice Gallery](https://speech.microsoft.com/portal/voicegallery)

### Required Environment Variables

* `AZURE_SPEECH_API_KEY`: Your Azure Speech service API key
* `AZURE_SPEECH_REGION`: Your Azure Speech service region (e.g., "eastus")

## Configuration

### AzureTTSService

<ParamField path="api_key" type="str" required>
  Azure Cognitive Services subscription key.
</ParamField>

<ParamField path="region" type="str" required>
  Azure region identifier (e.g., `"eastus"`, `"westus2"`).
</ParamField>

<ParamField path="voice" type="str" default="en-US-SaraNeural" deprecated>
  Voice name to use for synthesis. *Deprecated in v0.0.105. Use
  `settings=AzureTTSService.Settings(voice=...)` instead.*
</ParamField>

<ParamField path="sample_rate" type="int" default="None">
  Output audio sample rate in Hz. When `None`, uses the pipeline's configured
  sample rate.
</ParamField>

<ParamField path="text_aggregation_mode" type="TextAggregationMode" default="TextAggregationMode.SENTENCE">
  Controls how incoming text is aggregated before synthesis. `SENTENCE`
  (default) buffers text until sentence boundaries, producing more natural
  speech. `TOKEN` streams tokens directly for lower latency. Import from
  `pipecat.services.tts_service`.
</ParamField>

<ParamField path="aggregate_sentences" type="bool" default="None" deprecated>
  *Deprecated in v0.0.104.* Use `text_aggregation_mode` instead.
</ParamField>

<ParamField path="params" type="InputParams" default="None" deprecated>
  *Deprecated in v0.0.105. Use `settings=AzureTTSService.Settings(...)`
  instead.*
</ParamField>

<ParamField path="settings" type="AzureTTSService.Settings" default="None">
  Runtime-configurable settings. See [Settings](#settings) below.
</ParamField>

### AzureHttpTTSService

The HTTP service accepts the same parameters as the streaming service except `text_aggregation_mode` and `aggregate_sentences`:

<ParamField path="api_key" type="str" required>
  Azure Cognitive Services subscription key.
</ParamField>

<ParamField path="region" type="str" required>
  Azure region identifier.
</ParamField>

<ParamField path="voice" type="str" default="en-US-SaraNeural" deprecated>
  Voice name to use for synthesis. *Deprecated in v0.0.105. Use
  `settings=AzureHttpTTSService.Settings(voice=...)` instead.*
</ParamField>

<ParamField path="sample_rate" type="int" default="None">
  Output audio sample rate in Hz.
</ParamField>

<ParamField path="params" type="InputParams" default="None" deprecated>
  *Deprecated in v0.0.105. Use `settings=AzureHttpTTSService.Settings(...)`
  instead.*
</ParamField>

<ParamField path="settings" type="AzureHttpTTSService.Settings" default="None">
  Runtime-configurable settings. See [Settings](#settings) below.
</ParamField>

### Settings

Runtime-configurable settings passed via the `settings` constructor argument using `AzureTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.

| Parameter      | Type              | Default     | Description                            |
| -------------- | ----------------- | ----------- | -------------------------------------- |
| `model`        | `str`             | `None`      | Model identifier. *(Inherited.)*       |
| `voice`        | `str`             | `None`      | Voice identifier. *(Inherited.)*       |
| `language`     | `Language \| str` | `None`      | Language for synthesis. *(Inherited.)* |
| `emphasis`     | `str`             | `NOT_GIVEN` | Emphasis level for SSML.               |
| `pitch`        | `str`             | `NOT_GIVEN` | Pitch adjustment.                      |
| `rate`         | `str`             | `NOT_GIVEN` | Speaking rate.                         |
| `role`         | `str`             | `NOT_GIVEN` | Role for SSML.                         |
| `style`        | `str`             | `NOT_GIVEN` | Speaking style.                        |
| `style_degree` | `str`             | `NOT_GIVEN` | Degree of the speaking style.          |
| `volume`       | `str`             | `NOT_GIVEN` | Volume level.                          |

## Usage

### Basic Setup

```python theme={null}
from pipecat.services.azure import AzureTTSService

tts = AzureTTSService(
    api_key=os.getenv("AZURE_SPEECH_API_KEY"),
    region=os.getenv("AZURE_SPEECH_REGION"),
    settings=AzureTTSService.Settings(
        voice="en-US-SaraNeural",
    ),
)
```

### With Voice Customization

```python theme={null}
from pipecat.transcriptions.language import Language

tts = AzureTTSService(
    api_key=os.getenv("AZURE_SPEECH_API_KEY"),
    region="eastus",
    settings=AzureTTSService.Settings(
        voice="en-US-JennyMultilingualNeural",
        language=Language.EN_US,
        style="cheerful",
        style_degree="1.5",
        rate="1.1",
    ),
)
```

### HTTP Service

```python theme={null}
from pipecat.services.azure import AzureHttpTTSService

tts = AzureHttpTTSService(
    api_key=os.getenv("AZURE_SPEECH_API_KEY"),
    region=os.getenv("AZURE_SPEECH_REGION"),
    voice="en-US-SaraNeural",
)
```

<Tip>
  The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
  `Settings` / `settings=` instead. See the [Service Settings
  guide](/pipecat/fundamentals/service-settings) for migration details.
</Tip>

## Notes

* **Streaming vs HTTP**: The streaming service (`AzureTTSService`) provides word-level timestamps and lower latency, making it better for interactive conversations. The HTTP service is simpler but returns the complete audio at once.
* **SSML support**: Both services automatically construct SSML from the `Settings`. Special characters in text are automatically escaped.
* **Word timestamps**: `AzureTTSService` supports word-level timestamps for synchronized text display. CJK languages receive special handling to merge individual characters into meaningful word units.
* **8kHz workaround**: At 8kHz sample rates, Azure's reported audio duration may not match word boundary offsets. The service uses word boundary offsets for timing in this case.
