Skip to main content

Overview

GroqSTTService provides high-accuracy speech recognition using Groq’s hosted Whisper API with ultra-fast inference speeds. It uses Voice Activity Detection (VAD) to process speech segments efficiently for optimal performance and accuracy.

Installation

To use Groq services, install the required dependency:
pip install "pipecat-ai[groq]"

Prerequisites

Groq Account Setup

Before using Groq STT services, you need:
  1. Groq Account: Sign up at Groq Console
  2. API Key: Generate an API key from your console dashboard
  3. Model Access: Ensure access to Whisper transcription models

Required Environment Variables

  • GROQ_API_KEY: Your Groq API key for authentication

Configuration

model
str
default:"whisper-large-v3-turbo"
Whisper model to use for transcription.
api_key
str
default:"None"
Groq API key. If not provided, uses GROQ_API_KEY environment variable.
base_url
str
default:"https://api.groq.com/openai/v1"
API base URL. Override for custom or proxied deployments.
language
Language
default:"Language.EN"
Language of the audio input.
prompt
str
default:"None"
Optional text to guide the model’s style or continue a previous segment.
temperature
float
default:"None"
Sampling temperature between 0 and 1. Lower values are more deterministic. Defaults to 0.0.
ttfs_p99_latency
float
default:"GROQ_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment.

Usage

Basic Setup

from pipecat.services.groq import GroqSTTService

stt = GroqSTTService(
    api_key=os.getenv("GROQ_API_KEY"),
)

With Custom Model and Language

from pipecat.services.groq import GroqSTTService
from pipecat.transcriptions.language import Language

stt = GroqSTTService(
    api_key=os.getenv("GROQ_API_KEY"),
    model="whisper-large-v3-turbo",
    language=Language.ES,
)

With Prompt and Temperature

from pipecat.services.groq import GroqSTTService

stt = GroqSTTService(
    api_key=os.getenv("GROQ_API_KEY"),
    prompt="This is a conversation about artificial intelligence and machine learning.",
    temperature=0.0,
)

Notes

  • Segmented processing: GroqSTTService inherits from SegmentedSTTService (via BaseWhisperSTTService), which buffers audio during speech (detected by VAD) and sends complete segments for transcription. This means it does not provide interim results — only final transcriptions after each speech segment.
  • Whisper API compatible: Groq uses the OpenAI-compatible Whisper API format. The service sends audio in WAV format and receives JSON transcription responses.
  • Ultra-fast inference: Groq’s LPU (Language Processing Unit) infrastructure provides significantly faster inference than CPU/GPU-based Whisper deployments, making it suitable for real-time applications despite the segmented processing approach.
  • Prompt guidance: Use the prompt parameter to provide context that helps the model with domain-specific terminology or to maintain consistency across segments.