Groq (Whisper)
Speech-to-text service implementation using Groq’s Whisper API
Overview
GroqSTTService
provides speech-to-text capabilities using Groq’s hosted Whisper API. It offers high-accuracy transcription with minimal setup requirements.
Installation
To use GroqSTTService
, install the required dependencies:
You’ll need to set up your Groq API key as an environment variable: GROQ_API_KEY
.
You can obtain a Groq API key from the Groq Console.
Configuration
Constructor Parameters
Whisper model to use. Currently only “whisper-large-v3-turbo” is available.
Your Groq API key. If not provided, will use environment variable.
Custom API base URL for Groq API requests.
Input
The service processes audio data with the following requirements:
- PCM audio format
- 16-bit depth
- Single channel (mono)
Output Frames
The service produces two types of frames during transcription:
TranscriptionFrame
Generated for final transcriptions, containing:
Transcribed text
User identifier
ISO 8601 formatted timestamp
Detected language (if available)
ErrorFrame
Generated when transcription errors occur, containing error details.
Methods
Set Model
See the STT base class methods for additional functionality.
Usage Example
Frame Flow
Metrics Support
The service collects the following metrics:
- Time to First Byte (TTFB)
- Processing duration
- API response time
Notes
- Requires valid Groq API key
- Uses Groq’s hosted Whisper model
- Handles API rate limiting
- Automatic error handling
- Thread-safe processing
Error Handling
The service handles common API errors including:
- Authentication errors
- Rate limiting
- Invalid audio format
- Network connectivity issues
- API timeouts
Errors are propagated through ErrorFrames with descriptive messages.