AWS Polly
Text-to-speech service implementation using Amazon AWS Polly
Overview
AWSTTSService
provides text-to-speech capabilities using Amazon’s Polly service. It supports multiple voices, languages, and speech customization options through SSML.
Installation
To use AWSTTSService
, install the required dependencies:
You’ll also need to set up your AWS credentials as environment variables:
AWS_SECRET_ACCESS_KEY
AWS_ACCESS_KEY_ID
AWS_REGION
Configuration
Constructor Parameters
AWS secret access key
AWS access key ID
AWS region name
AWS Polly voice identifier
Output audio sample rate in Hz
Modifies text provided to the TTS. Learn more about the available filters.
Input Parameters
Output Frames
Control Frames
Signals start of speech synthesis
Signals completion of speech synthesis
Audio Frames
Contains generated audio data with: - PCM audio format - Specified sample rate
- Single channel (mono)
Error Frames
Contains AWS Polly error information
Methods
See the TTS base class methods for additional functionality.
Language Support
Supports multiple languages and regional variants:
Language Code | Description | Service Code |
---|---|---|
Language.CA | Catalan | ca-ES |
Language.ZH | Chinese (Mandarin) | cmn-CN |
Language.DA | Danish | da-DK |
Language.NL | Dutch | nl-NL |
Language.NL_BE | Dutch (Belgium) | nl-BE |
Language.EN | English (US) | en-US |
Language.EN_AU | English (Australia) | en-AU |
Language.EN_GB | English (UK) | en-GB |
Language.EN_IN | English (India) | en-IN |
Language.EN_NZ | English (New Zealand) | en-NZ |
Language.FR | French | fr-FR |
Language.FR_CA | French (Canada) | fr-CA |
Language.DE | German | de-DE |
Language.HI | Hindi | hi-IN |
Language.IT | Italian | it-IT |
Language.JA | Japanese | ja-JP |
Language.KO | Korean | ko-KR |
Language.NO | Norwegian | nb-NO |
Language.PL | Polish | pl-PL |
Language.PT | Portuguese | pt-PT |
Language.PT_BR | Portuguese (Brazil) | pt-BR |
Language.RO | Romanian | ro-RO |
Language.RU | Russian | ru-RU |
Language.ES | Spanish | es-ES |
Language.SV | Swedish | sv-SE |
Language.TR | Turkish | tr-TR |
Usage Example
SSML Support
The service automatically constructs SSML tags for advanced speech control:
Note: Prosody tags (rate, pitch, volume) are only supported for standard and neural engines, not the generative engine.
Frame Flow
Metrics Support
The service collects processing metrics:
- Time to First Byte (TTFB)
- Processing duration
- Character usage
- API calls
Notes
- Supports multiple AWS Polly engines (standard, neural, generative)
- Automatic audio resampling
- SSML-based speech customization
- Chunked audio delivery
- Thread-safe processing
- Automatic error handling
- Manages AWS client lifecycle