Fal (Wizper)
Speech-to-text service implementation using Fal’s Wizper API
Overview
FalSTTService
provides speech-to-text capabilities using Fal’s Wizper API. It offers high-quality transcription with minimal setup. The service uses Voice Activity Detection (VAD) to process only speech segments, optimizing API usage and improving response time.
Installation
To use FalSTTService
, install the required dependencies:
You’ll need to set up your Fal API key as an environment variable: FAL_KEY
.
You can obtain a Fal API key from the Fal platform.
Configuration
Constructor Parameters
Your Fal API key. If not provided, will use the FAL_KEY environment variable.
Audio sample rate in Hz. If not provided, uses the pipeline’s sample rate.
Configuration parameters for the Wizper API. See InputParams below.
InputParams
Language of the audio input. Defaults to English.
Task to perform. Options are ‘transcribe’ or ‘translate’.
Level of chunking for the audio. Default is ‘segment’.
Version of Wizper model to use.
Input
The service processes audio data with the following requirements:
- PCM audio format
- 16-bit depth
- Single channel (mono)
Output Frames
The service produces two types of frames during transcription:
TranscriptionFrame
Generated for final transcriptions, containing:
Transcribed text
User identifier
ISO 8601 formatted timestamp
Detected language (if available)
ErrorFrame
Generated when transcription errors occur, containing error details.
Methods
Set Model
See the STT base class methods for additional functionality.
Language Support
Fal Wizper supports a wide range of languages. The service automatically maps Language
enum values to the appropriate Wizper language codes.
Language Code | Description | Wizper Code |
---|---|---|
Language.AF | Afrikaans | af |
Language.AM | Amharic | am |
Language.AR | Arabic | ar |
Language.AS | Assamese | as |
Language.AZ | Azerbaijani | az |
Language.BA | Bashkir | ba |
Language.BE | Belarusian | be |
Language.BG | Bulgarian | bg |
Language.BN | Bengali | bn |
Language.BO | Tibetan | bo |
Language.BR | Breton | br |
Language.BS | Bosnian | bs |
Language.CA | Catalan | ca |
Language.CS | Czech | cs |
Language.CY | Welsh | cy |
Language.DA | Danish | da |
Language.DE | German | de |
Language.EL | Greek | el |
Language.EN | English | en |
Language.ES | Spanish | es |
Language.ET | Estonian | et |
Language.EU | Basque | eu |
Language.FA | Persian | fa |
Language.FI | Finnish | fi |
Language.FO | Faroese | fo |
Language.FR | French | fr |
Language.GL | Galician | gl |
Language.GU | Gujarati | gu |
Language.HA | Hausa | ha |
Language.HE | Hebrew | he |
Language.HI | Hindi | hi |
Language.HR | Croatian | hr |
Language.HT | Haitian Creole | ht |
Language.HU | Hungarian | hu |
Language.HY | Armenian | hy |
Language.ID | Indonesian | id |
Language.IS | Icelandic | is |
Language.IT | Italian | it |
Language.JA | Japanese | ja |
Language.JW | Javanese | jw |
Language.KA | Georgian | ka |
Language.KK | Kazakh | kk |
Language.KM | Khmer | km |
Language.KN | Kannada | kn |
Language.KO | Korean | ko |
Language.LA | Latin | la |
Language.LB | Luxembourgish | lb |
Language.LN | Lingala | ln |
Language.LO | Lao | lo |
Language.LT | Lithuanian | lt |
Language.LV | Latvian | lv |
Language.MG | Malagasy | mg |
Language.MI | Maori | mi |
Language.MK | Macedonian | mk |
Language.ML | Malayalam | ml |
Language.MN | Mongolian | mn |
Language.MR | Marathi | mr |
Language.MS | Malay | ms |
Language.MT | Maltese | mt |
Language.MY | Burmese | my |
Language.NE | Nepali | ne |
Language.NL | Dutch | nl |
Language.NN | Norwegian Nynorsk | nn |
Language.NO | Norwegian | no |
Language.OC | Occitan | oc |
Language.PA | Punjabi | pa |
Language.PL | Polish | pl |
Language.PS | Pashto | ps |
Language.PT | Portuguese | pt |
Language.RO | Romanian | ro |
Language.RU | Russian | ru |
Language.SA | Sanskrit | sa |
Language.SD | Sindhi | sd |
Language.SI | Sinhala | si |
Language.SK | Slovak | sk |
Language.SL | Slovenian | sl |
Language.SN | Shona | sn |
Language.SO | Somali | so |
Language.SQ | Albanian | sq |
Language.SR | Serbian | sr |
Language.SU | Sundanese | su |
Language.SV | Swedish | sv |
Language.SW | Swahili | sw |
Language.TA | Tamil | ta |
Language.TE | Telugu | te |
Language.TG | Tajik | tg |
Language.TH | Thai | th |
Language.TK | Turkmen | tk |
Language.TL | Tagalog | tl |
Language.TR | Turkish | tr |
Language.TT | Tatar | tt |
Language.UK | Ukrainian | uk |
Language.UR | Urdu | ur |
Language.UZ | Uzbek | uz |
Language.VI | Vietnamese | vi |
Language.YI | Yiddish | yi |
Language.YO | Yoruba | yo |
Language.ZH | Chinese | zh |
Fal Wizper supports a range of languages and dialects. For the most accurate transcription, specify the correct language for your audio input.
Usage Example
Voice Activity Detection Integration
This service inherits from SegmentedSTTService
, which uses Voice Activity Detection (VAD) to identify speech segments for processing. This approach:
- Processes only actual speech, not silence or background noise
- Maintains a small audio buffer (default 1 second) to capture speech that occurs slightly before VAD detection
- Receives
UserStartedSpeakingFrame
andUserStoppedSpeakingFrame
from a VAD component in the pipeline - Only sends complete utterances to the API when speech has ended
Ensure your transport includes a VAD component (like
SileroVADAnalyzer
) to
properly detect speech segments.
Metrics Support
The service collects the following metrics:
- Processing duration
- API response time
- Success/failure rates
Notes
- Requires valid Fal API key
- Uses Fal’s Wizper model
- Requires VAD component in transport
- Processes complete utterances, not continuous audio
- Thread-safe processing
Error Handling
The service handles common API errors including:
- Authentication errors
- API availability issues
- Invalid audio format
- Network connectivity issues
- API timeouts
Errors are propagated through ErrorFrames with descriptive messages.