Overview

AzureTTSService provides high-quality text-to-speech synthesis using Azure’s Cognitive Services. It supports SSML for advanced voice control and multiple languages.

Installation

To use AzureTTSService, install the required dependencies:

pip install pipecat-ai[azure]

You’ll also need to set up the following environment variables:

  • AZURE_API_KEY
  • AZURE_REGION

Configuration

Constructor Parameters

api_key
str
required

Azure Speech Service API key

region
str
required

Azure region identifier

voice
str
default: "en-US-SaraNeural"

Voice identifier

sample_rate
int
default: "24000"

Output audio sample rate in Hz

text_filter
BaseTextFilter
default: "None"

Modifies text provided to the TTS. Learn more about the available filters.

Input Parameters

class InputParams(BaseModel):
    emphasis: Optional[str]
    language: Optional[Language] = Language.EN_US
    pitch: Optional[str]
    rate: Optional[str] = "1.05"
    role: Optional[str]
    style: Optional[str]
    style_degree: Optional[str]
    volume: Optional[str]

Supported Sample Rates

  • 8000 Hz: Raw8Khz16BitMonoPcm
  • 16000 Hz: Raw16Khz16BitMonoPcm
  • 22050 Hz: Raw22050Hz16BitMonoPcm
  • 24000 Hz: Raw24Khz16BitMonoPcm
  • 44100 Hz: Raw44100Hz16BitMonoPcm
  • 48000 Hz: Raw48Khz16BitMonoPcm

Usage Example

# Configure service
tts_service = AzureTTSService(
    api_key="your-api-key",
    region="eastus",
    voice="en-US-JennyNeural",
    params=AzureTTSService.InputParams(
        language=Language.EN_US,
        rate="1.1",
        style="cheerful"
    )
)

# Use in pipeline
pipeline = Pipeline([
    text_input,
    tts_service,
    audio_output
])

Methods

See the TTS base class methods for additional functionality.

Language Support

Azure Speech Services support the following languages and regional variants:

Language CodeDescriptionService Code
Language.BGBulgarianbg-BG
Language.CACatalanca-ES
Language.ZHChinese (Simplified)zh-CN
Language.ZH_TWChinese (Traditional)zh-TW
Language.CSCzechcs-CZ
Language.DADanishda-DK
Language.NLDutch (Netherlands)nl-NL
Language.NL_BEDutch (Belgium)nl-BE
Language.ENEnglish (US)en-US
Language.EN_USEnglish (US)en-US
Language.EN_AUEnglish (Australia)en-AU
Language.EN_GBEnglish (UK)en-GB
Language.EN_NZEnglish (New Zealand)en-NZ
Language.EN_INEnglish (India)en-IN
Language.ETEstonianet-EE
Language.FIFinnishfi-FI
Language.FRFrench (France)fr-FR
Language.FR_CAFrench (Canada)fr-CA
Language.DEGerman (Germany)de-DE
Language.DE_CHGerman (Switzerland)de-CH
Language.ELGreekel-GR
Language.HIHindihi-IN
Language.HUHungarianhu-HU
Language.IDIndonesianid-ID
Language.ITItalianit-IT
Language.JAJapaneseja-JP
Language.KOKoreanko-KR
Language.LVLatvianlv-LV
Language.LTLithuanianlt-LT
Language.MSMalayms-MY
Language.NONorwegiannb-NO
Language.PLPolishpl-PL
Language.PTPortuguese (Portugal)pt-PT
Language.PT_BRPortuguese (Brazil)pt-BR
Language.RORomanianro-RO
Language.RURussianru-RU
Language.SKSlovaksk-SK
Language.ESSpanishes-ES
Language.SVSwedishsv-SE
Language.THThaith-TH
Language.TRTurkishtr-TR
Language.UKUkrainianuk-UA
Language.VIVietnamesevi-VN

Usage Examples

TTS Configuration

# Configure TTS with specific language
tts_service = AzureTTSService(
    api_key="your-api-key",
    region="eastus",
    params=AzureTTSService.InputParams(
        language=Language.FR_CA,  # Canadian French
        voice="fr-CA-SylvieNeural"
    )
)

Regional Considerations

  • Each language code includes both language and region (e.g., fr-FR for French in France)
  • Some languages have multiple regional variants (e.g., English has US, UK, Australian, Indian, and New Zealand variants)
  • Voice availability may vary by region and language
  • Neural voices are available for most language/region combinations
  • Some features (like custom pronunciation) may be limited to specific languages

Note: Voice selection should match the specified language code for optimal results. Check Azure’s documentation for the latest list of available voices for each language/region combination.

SSML Support

The service supports rich SSML customization:

# Example with multiple SSML features
params = AzureTTSService.InputParams(
    emphasis="strong",
    pitch="+2st",
    rate="1.2",
    style="cheerful",
    style_degree="2",
    volume="loud"
)

Frame Flow

Metrics Support

The service collects processing metrics:

  • Time to First Byte (TTFB)
  • Processing duration
  • Character usage
  • API calls

Notes

  • SSML-based speech customization
  • Chunked audio delivery
  • Thread-safe processing
  • Automatic error handling
  • Manages Azure client lifecycle