Overview
GroqSTTService provides high-accuracy speech recognition using Groq’s hosted Whisper API with ultra-fast inference speeds. It uses Voice Activity Detection (VAD) to process speech segments efficiently for optimal performance and accuracy.
Groq STT API Reference
Pipecat’s API methods for Groq STT integration
Example Implementation
Complete example with Groq ecosystem integration
Groq Documentation
Official Groq STT documentation and features
Groq Console
Access API keys and Whisper models
Installation
To use Groq services, install the required dependency:Prerequisites
Groq Account Setup
Before using Groq STT services, you need:- Groq Account: Sign up at Groq Console
- API Key: Generate an API key from your console dashboard
- Model Access: Ensure access to Whisper transcription models
Required Environment Variables
GROQ_API_KEY: Your Groq API key for authentication
Configuration
Whisper model to use for transcription.
Groq API key. If not provided, uses
GROQ_API_KEY environment variable.API base URL. Override for custom or proxied deployments.
Language of the audio input.
Optional text to guide the model’s style or continue a previous segment.
Sampling temperature between 0 and 1. Lower values are more deterministic. Defaults to 0.0.
P99 latency from speech end to final transcript in seconds. Override for your deployment.
Usage
Basic Setup
With Custom Model and Language
With Prompt and Temperature
Notes
- Segmented processing:
GroqSTTServiceinherits fromSegmentedSTTService(viaBaseWhisperSTTService), which buffers audio during speech (detected by VAD) and sends complete segments for transcription. This means it does not provide interim results — only final transcriptions after each speech segment. - Whisper API compatible: Groq uses the OpenAI-compatible Whisper API format. The service sends audio in WAV format and receives JSON transcription responses.
- Ultra-fast inference: Groq’s LPU (Language Processing Unit) infrastructure provides significantly faster inference than CPU/GPU-based Whisper deployments, making it suitable for real-time applications despite the segmented processing approach.
- Prompt guidance: Use the
promptparameter to provide context that helps the model with domain-specific terminology or to maintain consistency across segments.