Overview

FalSTTService provides speech-to-text capabilities using Fal’s Wizper API. It offers high-quality transcription with minimal setup. The service uses Voice Activity Detection (VAD) to process only speech segments, optimizing API usage and improving response time.

Installation

To use FalSTTService, install the required dependencies:

pip install "pipecat-ai[fal]"

You’ll need to set up your Fal API key as an environment variable: FAL_KEY.

You can obtain a Fal API key from the Fal platform.

Configuration

Constructor Parameters

api_key
str

Your Fal API key. If not provided, will use the FAL_KEY environment variable.

sample_rate
int

Audio sample rate in Hz. If not provided, uses the pipeline’s sample rate.

params
InputParams

Configuration parameters for the Wizper API. See InputParams below.

InputParams

language
Language
default:"Language.EN"

Language of the audio input. Defaults to English.

task
str
default:"transcribe"

Task to perform. Options are ‘transcribe’ or ‘translate’.

chunk_level
str
default:"segment"

Level of chunking for the audio. Default is ‘segment’.

version
str
default:"3"

Version of Wizper model to use.

Input

The service processes audio data with the following requirements:

  • PCM audio format
  • 16-bit depth
  • Single channel (mono)

Output Frames

The service produces two types of frames during transcription:

TranscriptionFrame

Generated for final transcriptions, containing:

text
string

Transcribed text

user_id
string

User identifier

timestamp
string

ISO 8601 formatted timestamp

language
Language

Detected language (if available)

ErrorFrame

Generated when transcription errors occur, containing error details.

Methods

Set Model

await service.set_model("wizper-v3")

See the STT base class methods for additional functionality.

Language Support

Fal Wizper supports a wide range of languages. The service automatically maps Language enum values to the appropriate Wizper language codes.

Language CodeDescriptionWizper Code
Language.AFAfrikaansaf
Language.AMAmharicam
Language.ARArabicar
Language.ASAssameseas
Language.AZAzerbaijaniaz
Language.BABashkirba
Language.BEBelarusianbe
Language.BGBulgarianbg
Language.BNBengalibn
Language.BOTibetanbo
Language.BRBretonbr
Language.BSBosnianbs
Language.CACatalanca
Language.CSCzechcs
Language.CYWelshcy
Language.DADanishda
Language.DEGermande
Language.ELGreekel
Language.ENEnglishen
Language.ESSpanishes
Language.ETEstonianet
Language.EUBasqueeu
Language.FAPersianfa
Language.FIFinnishfi
Language.FOFaroesefo
Language.FRFrenchfr
Language.GLGaliciangl
Language.GUGujaratigu
Language.HAHausaha
Language.HEHebrewhe
Language.HIHindihi
Language.HRCroatianhr
Language.HTHaitian Creoleht
Language.HUHungarianhu
Language.HYArmenianhy
Language.IDIndonesianid
Language.ISIcelandicis
Language.ITItalianit
Language.JAJapaneseja
Language.JWJavanesejw
Language.KAGeorgianka
Language.KKKazakhkk
Language.KMKhmerkm
Language.KNKannadakn
Language.KOKoreanko
Language.LALatinla
Language.LBLuxembourgishlb
Language.LNLingalaln
Language.LOLaolo
Language.LTLithuanianlt
Language.LVLatvianlv
Language.MGMalagasymg
Language.MIMaorimi
Language.MKMacedonianmk
Language.MLMalayalamml
Language.MNMongolianmn
Language.MRMarathimr
Language.MSMalayms
Language.MTMaltesemt
Language.MYBurmesemy
Language.NENepaline
Language.NLDutchnl
Language.NNNorwegian Nynorsknn
Language.NONorwegianno
Language.OCOccitanoc
Language.PAPunjabipa
Language.PLPolishpl
Language.PSPashtops
Language.PTPortuguesept
Language.RORomanianro
Language.RURussianru
Language.SASanskritsa
Language.SDSindhisd
Language.SISinhalasi
Language.SKSlovaksk
Language.SLSloveniansl
Language.SNShonasn
Language.SOSomaliso
Language.SQAlbaniansq
Language.SRSerbiansr
Language.SUSundanesesu
Language.SVSwedishsv
Language.SWSwahilisw
Language.TATamilta
Language.TETelugute
Language.TGTajiktg
Language.THThaith
Language.TKTurkmentk
Language.TLTagalogtl
Language.TRTurkishtr
Language.TTTatartt
Language.UKUkrainianuk
Language.URUrduur
Language.UZUzbekuz
Language.VIVietnamesevi
Language.YIYiddishyi
Language.YOYorubayo
Language.ZHChinesezh

Fal Wizper supports a range of languages and dialects. For the most accurate transcription, specify the correct language for your audio input.

Usage Example

from pipecat.services.fal import FalSTTService
from pipecat.transcriptions.language import Language

# Configure service
stt = FalSTTService(
    api_key="your-fal-api-key",
    params=FalSTTService.InputParams(
        language=Language.EN,
    )
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    llm,
    ...
])

Voice Activity Detection Integration

This service inherits from SegmentedSTTService, which uses Voice Activity Detection (VAD) to identify speech segments for processing. This approach:

  • Processes only actual speech, not silence or background noise
  • Maintains a small audio buffer (default 1 second) to capture speech that occurs slightly before VAD detection
  • Receives UserStartedSpeakingFrame and UserStoppedSpeakingFrame from a VAD component in the pipeline
  • Only sends complete utterances to the API when speech has ended

Ensure your transport includes a VAD component (like SileroVADAnalyzer) to properly detect speech segments.

Metrics Support

The service collects the following metrics:

  • Processing duration
  • API response time
  • Success/failure rates

Notes

  • Requires valid Fal API key
  • Uses Fal’s Wizper model
  • Requires VAD component in transport
  • Processes complete utterances, not continuous audio
  • Thread-safe processing

Error Handling

The service handles common API errors including:

  • Authentication errors
  • API availability issues
  • Invalid audio format
  • Network connectivity issues
  • API timeouts

Errors are propagated through ErrorFrames with descriptive messages.