Overview
TTSCacheMixin is a lightweight caching layer that transparently wraps an
existing Pipecat TTS service to eliminate API costs for repeated phrases and
reduce response latency for cached audio. It is a utility mixin rather than a TTS
provider: it does not synthesize speech itself, but caches the audio produced by
another TTS service (such as Cartesia, ElevenLabs, Deepgram, Google, or OpenAI)
and replays it on subsequent requests. Audio can be cached in process with
MemoryCacheBackend (LRU) or shared across instances with RedisCacheBackend.
Source Repository
Source code, examples, and issues for the TTS Cache integration
PyPI Package
The
pipecat-tts-cache package on PyPIInstallation
This is a community-maintained package distributed separately frompipecat-ai:
How It Works
TTSCacheMixin is applied alongside an existing Pipecat TTS service class to
produce a cached variant. It intercepts frames in the pipeline to transparently
cache and replay audio:
- Deterministic key generation: Before requesting audio, a cache key is generated from the normalized text, voice ID, model, sample rate, and settings. API keys are excluded from the key.
- Cache check (
run_tts): On a cache hit, the mixin immediately pushes the cached audio frames (and any word timestamps) to the pipeline. On a miss, it calls the wrapped parent TTS service. - Collection (
push_frame): As the parent service produces audio, the mixin intercepts and aggregates the frames, then stores them in the cache backend for future use. - Interruption handling: When an
InterruptionFrameis received, the mixin clears pending cache write tasks and resets its batch state so no partial audio is committed.
TTSService subclass:
Configuration
TTSCacheMixin adds the following keyword arguments to the constructor of the
wrapped TTS service. All other positional and keyword arguments are passed
through to the parent class.
Cache backend instance (
MemoryCacheBackend or RedisCacheBackend). If
None, caching is disabled and calls pass straight through to the parent
service.Time-to-live for cache entries, in seconds. Defaults to 24 hours.
Optional namespace prefix applied to cache keys.
MemoryCacheBackend
In-memory LRU cache with TTL support, suitable for local development and single-process bots.Maximum number of cache entries to store before LRU eviction.
RedisCacheBackend
Distributed Redis cache that persists across restarts and can be shared across multiple bot instances. Requires theredis extra.
Redis connection URL.
Prefix applied to all cache keys.
Maximum number of Redis connections.
Socket timeout in seconds.
Additional keyword arguments forwarded to the underlying Redis client.
Usage
Basic in-memory cache
Distributed Redis cache
Monitoring and maintenance
Compatibility
The caching layer works with all Pipecat TTS services, applying a different caching strategy depending on the service architecture:| Service type | Caching strategy | Supported providers (examples) |
|---|---|---|
AudioContextWordTTS | Batch caching — splits audio at word boundaries per sentence | Cartesia, Rime |
WordTTSService | Full caching with preserved word-level timestamps | ElevenLabs, Hume |
TTSService | Standard caching of the full audio response (no alignment data) | Google, OpenAI, Deepgram (HTTP) |
InterruptibleTTS | Sentence caching — single-sentence responses only | Sarvam, Deepgram (WebSocket) |