Overview

FalSTTService provides speech-to-text capabilities using Fal’s Wizper API with Voice Activity Detection (VAD) to process only speech segments, optimizing API usage and improving response time.

Installation

To use Fal services, install the required dependency:
pip install "pipecat-ai[fal]"
You’ll also need to set up your Fal API key as an environment variable: FAL_KEY.
Get your API key from the Fal platform.

Frames

Input

  • InputAudioRawFrame - Raw PCM audio data (16-bit, mono)
  • UserStartedSpeakingFrame - VAD detection of speech start
  • UserStoppedSpeakingFrame - VAD detection of speech end (triggers processing)
  • STTUpdateSettingsFrame - Runtime transcription configuration updates
  • STTMuteFrame - Mute audio input for transcription

Output

  • TranscriptionFrame - Final transcription results after speech segment ends
  • ErrorFrame - API or processing errors

Models

Fal offers the Wizper model with version options:
ModelVersionDescription
wizper3Latest Wizper model (default)
wizper2Previous version for compatibility

VAD-Based Processing

FalSTTService extends SegmentedSTTService, which uses Voice Activity Detection to process complete speech segments:
  • Segment Processing: Only processes complete utterances, not continuous audio
  • Audio Buffering: Maintains a 1-second buffer to capture speech before VAD detection
  • VAD Requirement: Requires a VAD component like SileroVADAnalyzer in your transport

Language Support

Common languages:
  • Language.EN - English - en
  • Language.ES - Spanish - es
  • Language.FR - French - fr
  • Language.DE - German - de
  • Language.IT - Italian - it
  • Language.JA - Japanese - ja

Usage Example

Basic Configuration

Initialize the FalSTTService and use it in a pipeline:
from pipecat.services.fal.stt import FalSTTService
from pipecat.transcriptions.language import Language

# Simple setup
stt = FalSTTService(
    api_key=os.getenv("FAL_KEY")
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

Dynamic Configuration

Make settings updates by pushing an STTUpdateSettingsFrame for the FalSTTService:
from pipecat.frames.frames import STTUpdateSettingsFrame

await task.queue_frame(
    STTUpdateSettingsFrame(settings={"language": Language.FR})
)

Metrics

The service provides performance metrics:
  • Time to First Byte (TTFB) - Latency from audio input to first transcription
  • Processing Duration - Total time spent processing audio
Learn how to enable Metrics in your Pipeline.

Additional Notes

  • VAD Dependency: Requires a VAD component in your transport for speech segment detection
  • Segment Processing: Processes complete utterances rather than streaming audio
  • Translation Support: Can translate foreign speech directly to English when using translate task
  • Error Handling: Comprehensive error handling for API failures and network issues