Overview

PollyTTSService provides text-to-speech capabilities using Amazon’s Polly service. It supports multiple voices, languages, and speech customization options through SSML.

Installation

To use PollyTTSService, install the required dependencies:

pip install pipecat-ai[aws]

You’ll also need to set up your AWS credentials as environment variables:

  • AWS_SECRET_ACCESS_KEY
  • AWS_ACCESS_KEY_ID
  • AWS_REGION

Configuration

Constructor Parameters

api_key
str

AWS secret access key

aws_access_key_id
str

AWS access key ID

region
str

AWS region name

voice_id
str
default:
"Joanna"

AWS Polly voice identifier

sample_rate
int
default:
"24000"

Output audio sample rate in Hz

text_filter
BaseTextFilter
default:
"None"

Modifies text provided to the TTS. Learn more about the available filters.

Input Parameters

class InputParams(BaseModel):
    engine: Optional[str] = None      # Polly engine type
    language: Optional[Language] = Language.EN
    pitch: Optional[str] = None       # SSML pitch adjustment
    rate: Optional[str] = None        # SSML rate adjustment
    volume: Optional[str] = None      # SSML volume adjustment

Output Frames

Control Frames

TTSStartedFrame
Frame

Signals start of speech synthesis

TTSStoppedFrame
Frame

Signals completion of speech synthesis

Audio Frames

TTSAudioRawFrame
Frame

Contains generated audio data with: - PCM audio format - Specified sample rate

  • Single channel (mono)

Error Frames

ErrorFrame
Frame

Contains AWS Polly error information

Methods

See the TTS base class methods for additional functionality.

Language Support

Supports multiple languages and regional variants:

Language CodeDescriptionService Code
Language.CACatalanca-ES
Language.ZHChinese (Mandarin)cmn-CN
Language.DADanishda-DK
Language.NLDutchnl-NL
Language.NL_BEDutch (Belgium)nl-BE
Language.ENEnglish (US)en-US
Language.EN_AUEnglish (Australia)en-AU
Language.EN_GBEnglish (UK)en-GB
Language.EN_INEnglish (India)en-IN
Language.EN_NZEnglish (New Zealand)en-NZ
Language.FRFrenchfr-FR
Language.FR_CAFrench (Canada)fr-CA
Language.DEGermande-DE
Language.HIHindihi-IN
Language.ITItalianit-IT
Language.JAJapaneseja-JP
Language.KOKoreanko-KR
Language.NONorwegiannb-NO
Language.PLPolishpl-PL
Language.PTPortuguesept-PT
Language.PT_BRPortuguese (Brazil)pt-BR
Language.RORomanianro-RO
Language.RURussianru-RU
Language.ESSpanishes-ES
Language.SVSwedishsv-SE
Language.TRTurkishtr-TR

Usage Example

from pipecat.services.aws import PollyTTSService
from pipecat.transcriptions.language import Language

# Configure service
tts_service = PollyTTSService(
    api_key="your-aws-secret-key",
    aws_access_key_id="your-aws-access-key",
    region="us-east-1",
    voice_id="Joanna",
    params=PollyTTSService.InputParams(
        engine="neural",
        language=Language.EN,
        rate="medium",
        pitch="high"
    )
)

# Use in pipeline
pipeline = Pipeline([
    ...,
    llm,
    tts,
    transport.output(),
])

SSML Support

The service automatically constructs SSML tags for advanced speech control:

# Example with SSML controls
service = PollyTTSService(
    # ... other params ...
    params=PollyTTSService.InputParams(
        rate="+20%",      # Increase speed
        pitch="low",      # Lower pitch
        volume="loud"     # Increase volume
    )
)

Note: Prosody tags (rate, pitch, volume) are only supported for standard and neural engines, not the generative engine.

Frame Flow

Metrics Support

The service collects processing metrics:

  • Time to First Byte (TTFB)
  • Processing duration
  • Character usage
  • API calls

Notes

  • Supports multiple AWS Polly engines (standard, neural, generative)
  • Automatic audio resampling
  • SSML-based speech customization
  • Chunked audio delivery
  • Thread-safe processing
  • Automatic error handling
  • Manages AWS client lifecycle