Overview

GladiaSTTService provides real-time speech recognition using Gladia’s WebSocket API with support for 99+ languages, custom vocabulary, translation, sentiment analysis, and advanced audio processing features.

Installation

To use Gladia services, install the required dependency:
pip install "pipecat-ai[gladia]"
You’ll also need to set up your Gladia API key as an environment variable: GLADIA_API_KEY.
Get your API key from Gladia.

Frames

Input

  • InputAudioRawFrame - Raw PCM audio data (16-bit, 16kHz, mono)
  • STTUpdateSettingsFrame - Runtime transcription configuration updates
  • STTMuteFrame - Mute audio input for transcription

Output

  • InterimTranscriptionFrame - Real-time transcription updates
  • TranscriptionFrame - Final transcription results
  • TranslationFrame - Real-time translation results (when enabled)
  • ErrorFrame - Connection or processing errors

Models

Gladia offers several models optimized for different use cases:
ModelDescriptionBest For
solaria-1Latest general-purpose modelHigh accuracy, balanced performance
See Gladia’s model documentation for detailed comparisons.

Language Support

Gladia STT supports 99+ languages with automatic detection and code-switching:
Common languages:
  • Language.EN - English - en
  • Language.ES - Spanish - es
  • Language.FR - French - fr
  • Language.DE - German - de
  • Language.IT - Italian - it
  • Language.JA - Japanese - ja

Advanced Features

Automatic Language Detection

  • Single language: Fixed language for entire session
  • Multiple languages: Auto-detect per utterance with code_switching=True
  • No languages specified: Auto-detect from all supported languages

Custom Vocabulary

  • Add domain-specific terms with bias intensity (0.0-1.0)
  • Mix strings and CustomVocabularyItem objects
  • Configure default intensity for simple strings

Real-time Translation

  • Translate to multiple target languages simultaneously
  • Enhanced model for higher accuracy
  • Align translations with original utterances

Usage Example

Basic Configuration

Initialize the GladiaSTTService and use it in a pipeline:
from pipecat.services.gladia.stt import GladiaSTTService
from pipecat.services.gladia.config import GladiaInputParams, LanguageConfig

# Simple setup
stt = GladiaSTTService(
    api_key=os.getenv("GLADIA_API_KEY"),
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

Dynamic Configuration

Make settings updates by pushing an STTUpdateSettingsFrame for the GladiaSTTService:
from pipecat.frames.frames import STTUpdateSettingsFrame

await task.queue_frame(STTUpdateSettingsFrame(
    language=Language.FR,
))

Multi-language Configuration

from pipecat.services.gladia.config import LanguageConfig

# Multi-language with auto-detection
params = GladiaInputParams(
    language_config=LanguageConfig(
        languages=["en", "es", "fr"],  # English, Spanish, French
        code_switching=True  # Auto-detect language changes
    )
)

stt = GladiaSTTService(
    api_key=os.getenv("GLADIA_API_KEY"),
    params=params
)

Real-time Translation

from pipecat.services.gladia.config import (
    RealtimeProcessingConfig,
    TranslationConfig
)

# Enable real-time translation
params = GladiaInputParams(
    language_config=LanguageConfig(languages=["en"]),
    realtime_processing=RealtimeProcessingConfig(
        translation=True,
        translation_config=TranslationConfig(
            target_languages=["es"],
            match_original_utterances=True
        )
    )
)

stt = GladiaSTTService(
    api_key=os.getenv("GLADIA_API_KEY"),
    params=params
)

Custom Vocabulary

from pipecat.services.gladia.config import (
    CustomVocabularyConfig,
    CustomVocabularyItem,
    RealtimeProcessingConfig
)

# Add domain-specific vocabulary
custom_vocab = CustomVocabularyConfig(
    vocabulary=[
        CustomVocabularyItem(value="Pipecat", intensity=0.9),
        CustomVocabularyItem(value="WebRTC", intensity=0.8),
        "JavaScript",  # Simple string with default intensity
    ],
    default_intensity=0.6
)

params = GladiaInputParams(
    realtime_processing=RealtimeProcessingConfig(
        custom_vocabulary=True,
        custom_vocabulary_config=custom_vocab
    )
)

Metrics

The service provides comprehensive metrics:
  • Time to First Byte (TTFB) - Latency from audio input to first transcription
  • Processing Duration - Total time spent processing audio
Learn how to enable Metrics in your Pipeline.