Overview
GradiumSTTService provides real-time speech recognition using Gradium’s WebSocket API with support for multilingual transcription, semantic voice activity detection for smart turn-taking, and robust performance in noisy environments.
Gradium STT API Reference
Pipecat’s API methods for Gradium STT integration
Example Implementation
Complete example with interruption handling
Gradium Documentation
Official Gradium STT API documentation
Gradium Platform
Access API keys and speech models
Installation
To use Gradium services, install the required dependency:Prerequisites
Gradium Account Setup
Before using Gradium STT services, you need:- Gradium Account: Sign up at Gradium
- API Key: Generate an API key from your account dashboard
- Region Selection: Choose your preferred region (EU or US)
Required Environment Variables
GRADIUM_API_KEY: Your Gradium API key for authentication
Configuration
GradiumSTTService
Gradium API key for authentication.
WebSocket endpoint URL. Override for different regions or custom deployments.
Base audio encoding type. One of
"pcm", "wav", or "opus". For PCM, the
sample rate is appended automatically to form the input format (e.g., "pcm"
becomes "pcm_16000"). PCM accepts 8000, 16000, and 24000 Hz sample rates.Audio sample rate in Hz. If
None, uses the pipeline’s audio sample rate.Configuration parameters for language and delay settings. Deprecated in
v0.0.105. Use
settings=GradiumSTTService.Settings(...) instead.Optional JSON configuration string for additional model settings. Deprecated
in favor of
params.Runtime-configurable settings for the STT service. See Settings
below.
P99 latency from speech end to final transcript in seconds. Override for your
deployment. See stt-benchmark.
Settings
Runtime-configurable settings passed via thesettings constructor argument using GradiumSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | "default" | STT model identifier. (Inherited from base STT settings.) |
language | Language | str | None | Expected language of the audio. (Inherited from base STT settings.) Helps ground the model to a specific language and improve transcription quality. |
delay_in_frames | int | None | Server-side delay in audio frames (80ms each) before text is generated. Higher delays allow more context but increase latency. Allowed values: 7, 8, 10, 12, 14, 16, 20, 24, 36, 48. Default is 10 (800ms). Sent to Gradium API via json_config. |
Usage
Basic Setup
With Language and Delay Configuration
Notes
- Supported languages: German, English, Spanish, French, and Portuguese.
- Audio format: Configurable via
encodingandsample_rateparameters. Defaults to PCM with the pipeline’s sample rate. Supported PCM rates: 8000, 16000, and 24000 Hz. Audio is sent in 80ms chunks.
Event Handlers
Gradium STT supports the standard service connection events:| Event | Description |
|---|---|
on_connected | Connected to Gradium WebSocket |
on_disconnected | Disconnected from Gradium WebSocket |