Overview

GroqSTTService provides high-accuracy speech recognition using Groq’s hosted Whisper API with ultra-fast inference speeds. It uses Voice Activity Detection (VAD) to process speech segments efficiently to create speech segments to send to the API.

Installation

To use Groq services, install the required dependency:
pip install "pipecat-ai[groq]"
You’ll need to set up your Groq API key as an environment variable: GROQ_API_KEY.
Get your API key from the Groq Console.

Frames

Input

  • InputAudioRawFrame - Raw PCM audio data (16-bit, mono)
  • UserStartedSpeakingFrame - VAD signal to start buffering audio
  • UserStoppedSpeakingFrame - VAD signal to process buffered audio

Output

  • TranscriptionFrame - Final transcription results (no interim results)
  • ErrorFrame - API or processing errors

Models

Groq currently offers one optimized Whisper model:
ModelDescriptionPerformance
whisper-large-v3-turboGroq’s optimized Whisper large v3Ultra-fast inference with high accuracy
Groq’s hardware acceleration makes this model significantly faster than standard Whisper implementations while maintaining accuracy.

Language Support

Groq’s Whisper API supports 60+ languages with automatic language detection:
Common languages:
  • Language.EN - English - en
  • Language.ES - Spanish - es
  • Language.FR - French - fr
  • Language.DE - German - de
  • Language.IT - Italian - it
  • Language.JA - Japanese - ja
Regional variants (like EN_US, FR_CA) are automatically mapped to their base language codes.

Usage Example

Basic Configuration

Initialize the GroqSTTService and use it in a pipeline:
from pipecat.services.groq.stt import GroqSTTService
from pipecat.transcriptions.language import Language

# Simple setup with defaults
stt = GroqSTTService(
    api_key=os.getenv("GROQ_API_KEY"),
    language=Language.EN
)

# Use in pipeline with VAD
pipeline = Pipeline([
    transport.input(),  # Must include VAD analyzer
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

Advanced Configuration

# Optimized for specific use case
stt = GroqSTTService(
    api_key=os.getenv("GROQ_API_KEY"),
    model="whisper-large-v3-turbo",
    language=Language.EN,
    prompt="Transcribe this technical discussion about AI and machine learning.",
    temperature=0.0  # Deterministic output
)

Dynamic Configuration

Make settings updates by pushing an STTUpdateSettingsFrame for the GroqSTTService:
from pipecat.frames.frames import STTUpdateSettingsFrame

await task.queue_frame(STTUpdateSettingsFrame(
    language=Language.FR,
))

Metrics

The service provides comprehensive metrics:
  • Time to First Byte (TTFB) - API response latency
  • Processing Duration - Total transcription time
Learn how to enable Metrics in your Pipeline.

Additional Notes

  • Segmented Processing: Processes complete utterances, not continuous streams
  • No Interim Results: Only final transcriptions are provided (typical for batch APIs)
  • Audio Buffer: Maintains 1-second buffer to capture speech before VAD detection
  • Language Variants: Regional language codes automatically map to base languages
  • Context Prompts: Use prompts to improve accuracy for specific domains or conversation styles
  • Rate Limits: Check your Groq plan for concurrent request and usage limits
  • Hardware Optimization: Leverages Groq’s custom inference chips for maximum performance