Overview

AssemblyAISTTService provides real-time speech recognition using AssemblyAI’s WebSocket API with support for interim results, end-of-turn detection, and configurable audio processing parameters.

Installation

To use AssemblyAI services, install the required dependency:
pip install "pipecat-ai[assemblyai]"
You’ll also need to set up your AssemblyAI API key as an environment variable: ASSEMBLYAI_API_KEY.
Get your API key from AssemblyAI Console.

Frames

Input

  • InputAudioRawFrame - Raw PCM audio data (16-bit, 16kHz, mono)
  • UserStartedSpeakingFrame - VAD start signal (triggers TTFB metrics)
  • UserStoppedSpeakingFrame - VAD stop signal (triggers force endpoint if enabled)
  • STTUpdateSettingsFrame - Runtime transcription configuration updates
  • STTMuteFrame - Mute audio input for transcription

Output

  • InterimTranscriptionFrame - Real-time transcription updates
  • TranscriptionFrame - Final transcription results
  • TranslationFrame - Translated text (if translation is enabled)
  • ErrorFrame - Connection or processing errors

Language Support

AssemblyAI Streaming STT currently supports English only.

Usage Example

Basic Configuration

Initialize the AssemblyAISTTService and use it in a pipeline:
from pipecat.services.assemblyai.stt import AssemblyAISTTService
from pipecat.services.assemblyai.models import AssemblyAIConnectionParams

# Basic configuration
stt = AssemblyAISTTService(
    api_key=os.getenv("ASSEMBLYAI_API_KEY"),
)

# Configuration with custom parameters
stt = AssemblyAISTTService(
    api_key=os.getenv("ASSEMBLYAI_API_KEY"),
    connection_params=AssemblyAIConnectionParams(
        sample_rate=16000,
        formatted_finals=True,
        end_of_turn_confidence_threshold=0.8,
        max_turn_silence=1000
    ),
    vad_force_turn_endpoint=True
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

Dynamic Configuration

Make settings updates by pushing an STTUpdateSettingsFrame for the AssemblyAISTTService:
from pipecat.frames.frames import STTUpdateSettingsFrame

await task.queue_frame(
    STTUpdateSettingsFrame(settings={"language": Language.FR})
)

Metrics

The service provides:
  • Time to First Byte (TTFB) - Latency from speech start to first transcription
  • Processing Duration - Total time spent processing audio
Learn how to enable Metrics in your Pipeline.

Additional Notes

  • Connection Management: Automatically handles WebSocket connections, reconnections, and proper termination handshakes
  • VAD Integration: Supports force endpoint triggering when VAD detects speech stop, requiring a VAD processor in the pipeline
  • Error Handling: Error handling for connection issues and message processing failures