AWS Polly

Overview

AWS Polly provides text-to-speech synthesis through Amazon’s cloud service with support for standard, neural, and generative engines. The service offers extensive language support, SSML features, and voice customization options including prosody controls for pitch, rate, and volume.

API Reference

Complete API documentation and method details

AWS Polly Docs

Official AWS Polly documentation and features

Example Code

Working example with generative engine

Installation

To use AWS Polly services, install the required dependencies:

pip install "pipecat-ai[aws]"

You’ll also need to set up your AWS credentials as environment variables:

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN (if using temporary credentials)
AWS_REGION (defaults to “us-east-1”)

Set up AWS credentials through the AWS Console or use AWS CLI configuration.

Frames

Input

TextFrame - Text content to synthesize into speech
TTSSpeakFrame - Text that should be spoken immediately
TTSUpdateSettingsFrame - Runtime configuration updates
LLMFullResponseStartFrame / LLMFullResponseEndFrame - LLM response boundaries

Output

TTSStartedFrame - Signals start of synthesis
TTSAudioRawFrame - Generated audio data (PCM, resampled from 16kHz)
TTSStoppedFrame - Signals completion of synthesis
ErrorFrame - AWS API or processing errors

Language Support

View All Supported Languages

Language Code	Description	Service Code
`Language.AR`	Arabic	`arb`
`Language.AR_AE`	Arabic (UAE)	`ar-AE`
`Language.CA`	Catalan	`ca-ES`
`Language.ZH`	Chinese (Mandarin)	`cmn-CN`
`Language.YUE`	Chinese (Cantonese)	`yue-CN`
`Language.CS`	Czech	`cs-CZ`
`Language.DA`	Danish	`da-DK`
`Language.NL`	Dutch	`nl-NL`
`Language.NL_BE`	Dutch (Belgium)	`nl-BE`
`Language.EN`	English (US)	`en-US`
`Language.EN_AU`	English (Australia)	`en-AU`
`Language.EN_GB`	English (UK)	`en-GB`
`Language.EN_IN`	English (India)	`en-IN`
`Language.EN_NZ`	English (New Zealand)	`en-NZ`
`Language.EN_ZA`	English (South Africa)	`en-ZA`
`Language.FI`	Finnish	`fi-FI`
`Language.FR`	French	`fr-FR`
`Language.FR_BE`	French (Belgium)	`fr-BE`
`Language.FR_CA`	French (Canada)	`fr-CA`
`Language.DE`	German	`de-DE`
`Language.DE_AT`	German (Austria)	`de-AT`
`Language.DE_CH`	German (Switzerland)	`de-CH`
`Language.HI`	Hindi	`hi-IN`
`Language.IS`	Icelandic	`is-IS`
`Language.IT`	Italian	`it-IT`
`Language.JA`	Japanese	`ja-JP`
`Language.KO`	Korean	`ko-KR`
`Language.NO`	Norwegian	`nb-NO`
`Language.PL`	Polish	`pl-PL`
`Language.PT`	Portuguese	`pt-PT`
`Language.PT_BR`	Portuguese (Brazil)	`pt-BR`
`Language.RO`	Romanian	`ro-RO`
`Language.RU`	Russian	`ru-RU`
`Language.ES`	Spanish	`es-ES`
`Language.ES_MX`	Spanish (Mexico)	`es-MX`
`Language.ES_US`	Spanish (US)	`es-US`
`Language.SV`	Swedish	`sv-SE`
`Language.TR`	Turkish	`tr-TR`
`Language.CY`	Welsh	`cy-GB`

Common languages supported include:

Language.EN - English (US)
Language.ES - Spanish
Language.FR - French
Language.DE - German
Language.IT - Italian
Language.JA - Japanese

Usage Example

Basic Configuration

Initialize the AWSPollyTTSService and use it in a pipeline:

from pipecat.services.aws.tts import AWSPollyTTSService
from pipecat.transcriptions.language import Language
import os

tts = AWSPollyTTSService(
    aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
    api_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
    region="us-west-2",
    voice_id="Joanna",
    params=AWSPollyTTSService.InputParams(
        engine="neural",
        language=Language.EN,
        rate="+10%",
        volume="loud"
    )
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

Dynamic Configuration

Make settings updates by pushing a TTSUpdateSettingsFrame for the AWSPollyTTSService:

from pipecat.frames.frames import TTSUpdateSettingsFrame

await task.queue_frame(
    TTSUpdateSettingsFrame(settings={"voice": "your-new-voice-id"})
)

SSML Features

AWS Polly automatically constructs SSML for advanced speech control:

# Prosody controls (engine-dependent)
service = AWSPollyTTSService(
    voice_id="Joanna",
    params=AWSPollyTTSService.InputParams(
        engine="standard",   # Full prosody support
        rate="slow",         # SSML rate values
        pitch="low",         # Pitch adjustment
        volume="loud"        # Volume control
    )
)

# Lexicon support for custom pronunciations
service = AWSPollyTTSService(
    voice_id="Joanna",
    params=AWSPollyTTSService.InputParams(
        lexicon_names=["custom-pronunciations"]
    )
)

Metrics

The service provides comprehensive metrics:

Time to First Byte (TTFB) - Latency from text input to first audio
Processing Duration - Total synthesis time
Character Usage - Text processed for billing

Learn how to enable Metrics in your Pipeline.

Additional Notes

Engine Selection: Use generative for highest quality, neural for balance, standard for lowest latency
Region Requirements: Generative engine only available in select regions (us-west-2, us-east-1, etc.)
Audio Format: Service outputs PCM audio resampled from 16kHz to your specified rate
Credential Management: Supports both environment variables and direct credential passing
SSML Automatic: Service automatically wraps text in appropriate SSML tags based on parameters
Prosody Limitations: Generative engine only supports rate adjustment, not pitch or volume

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

API Reference

AWS Polly Docs

Example Code

Installation

Frames

Input

Output

Language Support

Usage Example

Basic Configuration

Dynamic Configuration

SSML Features

Metrics

Additional Notes

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

API Reference

AWS Polly Docs

Example Code

​Installation

​Frames

​Input

​Output

​Language Support

​Usage Example

​Basic Configuration

​Dynamic Configuration

​SSML Features

​Metrics

​Additional Notes

Overview

Installation

Frames

Input

Output

Language Support

Usage Example

Basic Configuration

Dynamic Configuration

SSML Features

Metrics

Additional Notes