Overview

Async provides two TTS service implementations:
  • AsyncAITTSService: WebSocket-based streaming TTS with interruption support
  • AsyncAIHttpTTSService: HTTP-based streaming TTS service for simpler synthesis
AsyncAITTSService is recommended for real-time applications.

Installation

To use Async services, install the required dependencies:
pip install "pipecat-ai[asyncai]"
You’ll also need to set up your Async API key as an environment variable: ASYNCAI_API_KEY.
Get your API key by signing up at async.

Frames

Input

  • TextFrame - Text content to synthesize into speech
  • TTSSpeakFrame - Text that the TTS service should speak
  • TTSUpdateSettingsFrame - Runtime configuration updates (e.g., voice)
  • LLMFullResponseStartFrame / LLMFullResponseEndFrame - LLM response boundaries

Output

  • TTSStartedFrame - Signals start of synthesis
  • TTSAudioRawFrame - Generated audio data chunks
  • TTSStoppedFrame - Signals completion of synthesis
  • ErrorFrame - Connection or processing errors

Service Comparison

FeatureAsyncAITTSService (WebSocket)AsyncAIHttpTTSService (HTTP)
Streaming✅ Low-latency chunks✅ Response streaming
Interruption✅ Advanced handling⚠️ Basic support
Latency🚀 Low📈 Higher
ConnectionWebSocket persistentHTTP per-request

Language Support

Async currently supports:
Language CodeDescriptionService Code
Language.ENEnglishen
Language.FRFrenchfr
Language.ESSpanishes
Language.DEGermande
Language.ITItalianit
Language support varies by model. Use multilingual model (asyncflow_multilingual_v1.0) for language specification.
Async is expanding language support. Check the official documentation for the latest available languages.

Usage Example

Initialize the WebSocket service with your API key and desired voice:
from pipecat.services.asyncai.tts import AsyncAITTSService
import os

# Configure WebSocket service
tts = AsyncAITTSService(
    api_key=os.getenv("ASYNCAI_API_KEY"),
    voice_id=os.getenv("ASYNCAI_VOICE_ID"),
    model="asyncflow_v2.0"
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

HTTP Service

Initialize the AsyncAIHttpTTSService and use it in a pipeline:
from pipecat.services.aysncai.tts import AsyncAIHttpTTSService
import aiohttp

# For simpler, non-persistent connections
async with aiohttp.ClientSession() as session:
    http_tts = AsyncAIHttpTTSService(
        api_key=os.getenv("ASYNCAI_API_KEY"),
        voice_id=os.getenv("ASYNCAI_VOICE_ID"),
        aiohttp_session=session,
        model="asyncflow_v2.0"
    )

Dynamic Configuration

Make settings updates by pushing an TTSUpdateSettingsFrame for either service:
from pipecat.frames.frames import TTSUpdateSettingsFrame

await task.queue_frame(
    TTSUpdateSettingsFrame(settings={"voice": "your-new-voice-id"})
)

Metrics

Both services provide:
  • Time to First Byte (TTFB) - Latency from text input to first audio
  • Processing Duration - Total synthesis time
  • Usage Metrics - Character count and synthesis statistics
Learn how to enable Metrics in your Pipeline.

Additional Notes

  • WebSocket Recommended: Use AsyncAITTSService for low-latency streaming cases
  • Connection Management: WebSocket maintains persistent connection with automatic keepalive (10-second intervals)
  • Sample Rate: Set globally in PipelineParams rather than per-service for consistency