Overview

AWSPollyTTSService provides text-to-speech capabilities using AWS’s Polly service. It supports multiple voices, languages, and speech customization options through SSML.

The older PollyTTSService class is still available but has been deprecated. Use AWSPollyTTSService instead.

Installation

To use AWSPollyTTSService, install the required dependencies:

pip install "pipecat-ai[aws]"

You’ll also need to set up your AWS credentials as environment variables:

  • AWS_SECRET_ACCESS_KEY
  • AWS_ACCESS_KEY_ID
  • AWS_SESSION_TOKEN (if using temporary credentials)
  • AWS_REGION (defaults to “us-east-1”)

Configuration

Constructor Parameters

api_key
str

AWS secret access key (can also use environment variable)

aws_access_key_id
str

AWS access key ID (can also use environment variable)

aws_session_token
str

AWS session token for temporary credentials (can also use environment variable)

region
str

AWS region name (defaults to “us-east-1” if not provided)

voice_id
str
default:"Joanna"

AWS Polly voice identifier

sample_rate
int
default:"None"

Output audio sample rate in Hz (resampled from Polly’s 16kHz)

text_filter
BaseTextFilter
default:"None"

Modifies text provided to the TTS. Learn more about the available filters.

params
InputParams

TTS configuration parameters

Input Parameters

class InputParams(BaseModel):
    engine: Optional[str] = None      # Polly engine type ("standard", "neural", or "generative")
    language: Optional[Language] = Language.EN
    pitch: Optional[str] = None       # SSML pitch adjustment
    rate: Optional[str] = None        # SSML rate adjustment
    volume: Optional[str] = None      # SSML volume adjustment

Output Frames

Control Frames

TTSStartedFrame
Frame

Signals start of speech synthesis

TTSStoppedFrame
Frame

Signals completion of speech synthesis

Audio Frames

TTSAudioRawFrame
Frame

Contains generated audio data with:

  • PCM audio format
  • Sample rate as specified (resampled from 16kHz)
  • Single channel (mono)

Error Frames

ErrorFrame
Frame

Contains AWS Polly error information

Methods

See the TTS base class methods for additional functionality.

Language Support

Supports an extensive range of languages and regional variants:

Language CodeDescriptionService Code
Language.ARArabicarb
Language.AR_AEArabic (UAE)ar-AE
Language.CACatalanca-ES
Language.ZHChinese (Mandarin)cmn-CN
Language.YUEChinese (Cantonese)yue-CN
Language.YUE_CNChinese (Cantonese)yue-CN
Language.CSCzechcs-CZ
Language.DADanishda-DK
Language.NLDutchnl-NL
Language.NL_BEDutch (Belgium)nl-BE
Language.ENEnglish (US)en-US
Language.EN_AUEnglish (Australia)en-AU
Language.EN_GBEnglish (UK)en-GB
Language.EN_INEnglish (India)en-IN
Language.EN_NZEnglish (New Zealand)en-NZ
Language.EN_USEnglish (US)en-US
Language.EN_ZAEnglish (South Africa)en-ZA
Language.FIFinnishfi-FI
Language.FRFrenchfr-FR
Language.FR_BEFrench (Belgium)fr-BE
Language.FR_CAFrench (Canada)fr-CA
Language.DEGermande-DE
Language.DE_ATGerman (Austria)de-AT
Language.DE_CHGerman (Switzerland)de-CH
Language.HIHindihi-IN
Language.ISIcelandicis-IS
Language.ITItalianit-IT
Language.JAJapaneseja-JP
Language.KOKoreanko-KR
Language.NONorwegiannb-NO
Language.NBNorwegian (Bokmål)nb-NO
Language.NB_NONorwegian (Bokmål)nb-NO
Language.PLPolishpl-PL
Language.PTPortuguesept-PT
Language.PT_BRPortuguese (Brazil)pt-BR
Language.PT_PTPortuguese (Portugal)pt-PT
Language.RORomanianro-RO
Language.RURussianru-RU
Language.ESSpanishes-ES
Language.ES_MXSpanish (Mexico)es-MX
Language.ES_USSpanish (US)es-US
Language.SVSwedishsv-SE
Language.TRTurkishtr-TR
Language.CYWelshcy-GB
Language.CY_GBWelshcy-GB

Usage Example

from pipecat.services.aws.tts import AWSPollyTTSService
from pipecat.transcriptions.language import Language

# Configure service using environment variables for credentials
tts = AWSPollyTTSService(
    region="us-west-2",
    voice_id="Joanna",
    params=AWSPollyTTSService.InputParams(
        engine="neural",
        language=Language.EN,
        rate="+10%",
        volume="loud"
    )
)

# Or provide credentials directly
tts = AWSPollyTTSService(
    aws_access_key_id="YOUR_ACCESS_KEY_ID",
    api_key="YOUR_SECRET_ACCESS_KEY",
    region="us-west-2",
    voice_id="Joanna",
    params=AWSPollyTTSService.InputParams(
        engine="generative",  # For newer generative voices
        language=Language.EN,
        rate="1.1"            # Generative engine rate format
    )
)

# Use in pipeline
pipeline = Pipeline([
    ...,
    llm,
    tts,
    transport.output(),
])

SSML Support

The service automatically constructs SSML tags for advanced speech control:

# Example with SSML controls
service = AWSPollyTTSService(
    # ... other params ...
    params=AWSPollyTTSService.InputParams(
        engine="neural",
        rate="+20%",      # Increase speed
        pitch="low",      # Lower pitch
        volume="loud"     # Increase volume
    )
)

Prosody tags (pitch, rate, volume) have different behaviors based on the engine: - Standard engine: Supports all prosody tags - Neural engine: Full prosody support - Generative engine: Only rate is supported, with a different format (e.g., “1.1” for 10% faster)

Frame Flow

Metrics Support

The service collects processing metrics:

  • Time to First Byte (TTFB)
  • Processing duration
  • Character usage
  • API calls

Notes

  • Supports all AWS Polly engines:
    • Standard (non-neural voices)
    • Neural (improved quality voices)
    • Generative (high-quality, natural-sounding voices)
  • Automatic audio resampling from 16kHz to any desired rate
  • Thread-safe processing
  • Automatic error handling
  • Manages AWS client lifecycle