STT Latency Tuning

What is TTFS?

Time To Final Segment (TTFS) measures how long it takes from the moment a user stops speaking until the STT service delivers the final transcript. This latency directly affects how long your bot waits before it starts responding.

User stops → [TTFS latency] → Final transcript arrives → Bot starts

Every STT service has a different TTFS profile based on its architecture, model complexity, and infrastructure. Pipecat ships with measured P99 latency values for each supported service so that turn detection can account for this delay automatically. Measured values were benchmarked using the stt-benchmark tool.

Why TTFS matters

TTFS feeds directly into turn stop strategies, which decide when the user has finished speaking and the bot should respond.

Value too low: The turn stop strategy gives up waiting before the final transcript arrives. The bot responds based on incomplete text, or misses the user’s input entirely.
Value too high: The bot waits longer than necessary after the user stops speaking, creating awkward pauses in the conversation.
Value just right: The bot waits long enough for the transcript to arrive, then responds immediately.

Getting TTFS right is one of the most impactful tuning knobs for perceived conversation responsiveness.

Default P99 latency values

Pipecat includes measured P99 TTFS values for every supported STT service. These are used automatically when you create a service — no configuration required. You can see default values in the source code.

Local services (NVIDIA, Whisper) default to 1.0s since actual latency depends entirely on your hardware. Always measure and override for local deployments.

Measuring latency for your deployment

The default values are measured under standard conditions, but your actual latency depends on:

Network distance to the STT provider
Region where the service is hosted
Service configuration (model size, language, features enabled)
Audio quality and encoding settings

Use the stt-benchmark tool to measure TTFS for your specific setup. The tool sends standardized audio samples to your STT service and reports P50, P90, and P99 latency values.

Overriding the default value

Pass the ttfs_p99_latency parameter to any STT service constructor to override the built-in default:

from pipecat.services.deepgram.stt import DeepgramSTTService

# Use a measured value from your deployment
stt = DeepgramSTTService(
    api_key=os.getenv("DEEPGRAM_API_KEY"),
    ttfs_p99_latency=0.45,  # Override with your measured P99
)

This value is broadcast to the pipeline via an STTMetadataFrame at startup, so turn stop strategies automatically adjust their timing.

If you’re deploying to a specific region or using a self-hosted STT service, always measure and override the default TTFS value. Even small differences (e.g., 0.35s vs 0.55s) can noticeably affect conversation responsiveness.

Learning Pipecat

Fundamentals

Features

Telephony

STT Latency Tuning

What is TTFS?

Why TTFS matters

Default P99 latency values

Measuring latency for your deployment

Overriding the default value

Learning Pipecat

Fundamentals

Features

Telephony

​What is TTFS?

​Why TTFS matters

​Default P99 latency values

​Measuring latency for your deployment

​Overriding the default value

What is TTFS?

Why TTFS matters

Default P99 latency values

Measuring latency for your deployment

Overriding the default value