Groq (Whisper)

Overview

GroqSTTService provides speech-to-text capabilities using Groq’s hosted Whisper API. It offers high-accuracy transcription with minimal setup requirements. The service uses Voice Activity Detection (VAD) to process only speech segments, optimizing API usage and improving response time.

Installation

To use GroqSTTService, install the required dependencies:

pip install "pipecat-ai[groq]"

You’ll need to set up your Groq API key as an environment variable: GROQ_API_KEY.

You can obtain a Groq API key from the Groq Console.

Configuration

Constructor Parameters

model

str

default:"whisper-large-v3-turbo"

Whisper model to use. Currently only “whisper-large-v3-turbo” is available.

api_key

str

Your Groq API key. If not provided, will use environment variable.

base_url

str

default:"https://api.groq.com/openai/v1"

Custom API base URL for Groq API requests.

language

Language

default:"Language.EN"

Language of the audio input. Defaults to English.

prompt

str

Optional text to guide the model’s style or continue a previous segment.

temperature

float

Sampling temperature between 0 and 1. Lower values are more deterministic, higher values more creative. Defaults to 0.0.

sample_rate

int

Audio sample rate in Hz. If not provided, uses the pipeline’s sample rate.

Input

The service processes audio data with the following requirements:

PCM audio format
16-bit depth
Single channel (mono)

Output Frames

The service produces two types of frames during transcription:

TranscriptionFrame

Generated for final transcriptions, containing:

text

string

Transcribed text

user_id

string

User identifier

timestamp

string

ISO 8601 formatted timestamp

language

Language

Detected language (if available)

ErrorFrame

Generated when transcription errors occur, containing error details.

Methods

Set Model

await service.set_model("whisper-large-v3-turbo")

See the STT base class methods for additional functionality.

Language Support

Groq’s Whisper API supports a wide range of languages. The service automatically maps Language enum values to the appropriate Whisper language codes.

Language Code	Description	Whisper Code
`Language.AF`	Afrikaans	`af`
`Language.AR`	Arabic	`ar`
`Language.HY`	Armenian	`hy`
`Language.AZ`	Azerbaijani	`az`
`Language.BE`	Belarusian	`be`
`Language.BS`	Bosnian	`bs`
`Language.BG`	Bulgarian	`bg`
`Language.CA`	Catalan	`ca`
`Language.ZH`	Chinese	`zh`
`Language.HR`	Croatian	`hr`
`Language.CS`	Czech	`cs`
`Language.DA`	Danish	`da`
`Language.NL`	Dutch	`nl`
`Language.EN`	English	`en`
`Language.ET`	Estonian	`et`
`Language.FI`	Finnish	`fi`
`Language.FR`	French	`fr`
`Language.GL`	Galician	`gl`
`Language.DE`	German	`de`
`Language.EL`	Greek	`el`
`Language.HE`	Hebrew	`he`
`Language.HI`	Hindi	`hi`
`Language.HU`	Hungarian	`hu`
`Language.IS`	Icelandic	`is`
`Language.ID`	Indonesian	`id`
`Language.IT`	Italian	`it`
`Language.JA`	Japanese	`ja`
`Language.KN`	Kannada	`kn`
`Language.KK`	Kazakh	`kk`
`Language.KO`	Korean	`ko`
`Language.LV`	Latvian	`lv`
`Language.LT`	Lithuanian	`lt`
`Language.MK`	Macedonian	`mk`
`Language.MS`	Malay	`ms`
`Language.MR`	Marathi	`mr`
`Language.MI`	Maori	`mi`
`Language.NE`	Nepali	`ne`
`Language.NO`	Norwegian	`no`
`Language.FA`	Persian	`fa`
`Language.PL`	Polish	`pl`
`Language.PT`	Portuguese	`pt`
`Language.RO`	Romanian	`ro`
`Language.RU`	Russian	`ru`
`Language.SR`	Serbian	`sr`
`Language.SK`	Slovak	`sk`
`Language.SL`	Slovenian	`sl`
`Language.ES`	Spanish	`es`
`Language.SW`	Swahili	`sw`
`Language.SV`	Swedish	`sv`
`Language.TL`	Tagalog	`tl`
`Language.TA`	Tamil	`ta`
`Language.TH`	Thai	`th`
`Language.TR`	Turkish	`tr`
`Language.UK`	Ukrainian	`uk`
`Language.UR`	Urdu	`ur`
`Language.VI`	Vietnamese	`vi`
`Language.CY`	Welsh	`cy`

Groq’s Whisper implementation supports language variants (like en-US, fr-CA) by mapping them to their base language. For example, Language.EN_US and Language.EN_GB will both map to en.

The service will automatically detect the language if none is specified, but specifying the language typically improves transcription accuracy.

For the most up-to-date list of supported languages, refer to the Groq documentation.

Usage Example

from pipecat.services.groq.stt import GroqSTTService
from pipecat.transcriptions.language import Language

# Configure service
stt = GroqSTTService(
    model="whisper-large-v3-turbo",
    api_key="your-api-key",
    language=Language.EN,
    prompt="Transcribe the following conversation",
    temperature=0.0
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    llm,
    ...
])

Voice Activity Detection Integration

This service inherits from SegmentedSTTService, which uses Voice Activity Detection (VAD) to identify speech segments for processing. This approach:

Processes only actual speech, not silence or background noise
Maintains a small audio buffer (default 1 second) to capture speech that occurs slightly before VAD detection
Receives UserStartedSpeakingFrame and UserStoppedSpeakingFrame from a VAD component in the pipeline
Only sends complete utterances to the API when speech has ended

Ensure your transport includes a VAD component (like SileroVADAnalyzer) to properly detect speech segments.

Metrics Support

The service collects the following metrics:

Time to First Byte (TTFB)
Processing duration
API response time

Notes

Requires valid Groq API key
Uses Groq’s hosted Whisper model
Requires VAD component in transport
Processes complete utterances, not continuous audio
Handles API rate limiting
Automatic error handling
Thread-safe processing

Error Handling

The service handles common API errors including:

Authentication errors
Rate limiting
Invalid audio format
Network connectivity issues
API timeouts

Errors are propagated through ErrorFrames with descriptive messages.

API Reference

Services

Utilities

Frameworks

Pipeline

Base Service Classes

Overview

Installation

Configuration

Constructor Parameters

Input

Output Frames

TranscriptionFrame

ErrorFrame

Methods

Set Model

Language Support

Usage Example

Voice Activity Detection Integration

Metrics Support

Notes

Error Handling

API Reference

Services

Utilities

Frameworks

Pipeline

Base Service Classes

​Overview

​Installation

​Configuration

​Constructor Parameters

​Input

​Output Frames

​TranscriptionFrame

​ErrorFrame

​Methods

​Set Model

​Language Support

​Usage Example

​Voice Activity Detection Integration

​Metrics Support

​Notes

​Error Handling

Overview

Installation

Configuration

Constructor Parameters

Input

Output Frames

TranscriptionFrame

ErrorFrame

Methods

Set Model

Language Support

Usage Example

Voice Activity Detection Integration

Metrics Support

Notes

Error Handling