Overview

SambaNovaSTTService provides speech-to-text capabilities using SambaNova’s hosted Whisper API with Voice Activity Detection (VAD) for optimized processing. It uses Voice Activity Detection (VAD) to process speech segments efficiently to create speech segments to send to the API.

Installation

To use SambaNova services, install the required dependency:

pip install "pipecat-ai[sambanova]"

You’ll also need to set up your SambaNova API key as an environment variable: SAMBANOVA_API_KEY.

Get your SambaNova API key from SambaNova Cloud.

Frames

Input

  • InputAudioRawFrame - Raw PCM audio data (16-bit, mono)
  • UserStartedSpeakingFrame - VAD start signal (begins audio buffering)
  • UserStoppedSpeakingFrame - VAD stop signal (triggers transcription)

Output

  • TranscriptionFrame - Final transcription results
  • ErrorFrame - API or processing errors

Models

SambaNova currently offers one Whisper model:

ModelDescriptionFeatures
Whisper-Large-v3OpenAI’s Whisper Large v3High accuracy, 99+ languages, robust to noise

Language Support

SambaNova’s Whisper implementation supports 99+ languages with automatic language detection:

Common languages:

  • Language.EN - English - en
  • Language.ES - Spanish - es
  • Language.FR - French - fr
  • Language.DE - German - de
  • Language.IT - Italian - it
  • Language.JA - Japanese - ja

Language variants (like en-US, fr-CA) are automatically mapped to their base language codes.

Usage Example

Basic Configuration

Initialize the SambaNovaSTTService and use it in a pipeline:

from pipecat.services.sambanova.stt import SambaNovaSTTService
from pipecat.transcriptions.language import Language

# Simple setup
stt = SambaNovaSTTService(
    api_key=os.getenv("SAMBANOVA_API_KEY"),
    model="Whisper-Large-v3",
    language=Language.EN
)

# Use in pipeline with VAD
pipeline = Pipeline([
    transport.input(),  # Must include VAD analyzer
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

Advanced Configuration

# Production-ready configuration
stt = SambaNovaSTTService(
    api_key=os.getenv("SAMBANOVA_API_KEY"),
    model="Whisper-Large-v3",
    language=Language.EN,
    prompt="Transcribe the following professional conversation:",
    temperature=0.1,  # More deterministic output
    base_url="https://api.sambanova.ai/v1"  # Custom endpoint if needed
)

Dynamic Configuration

Make settings updates by pushing an STTUpdateSettingsFrame for the SambaNovaSTTService:

from pipecat.frames.frames import STTUpdateSettingsFrame

await task.queue_frame(STTUpdateSettingsFrame(
    language=Language.FR,
  )
)

Metrics

The service provides comprehensive metrics:

  • Time to First Byte (TTFB) - Latency from audio input to first transcription
  • Processing Duration - Total time spent processing audio

Learn how to enable Metrics in your Pipeline.

Additional Notes

  • VAD Requirement: Must use a transport with VAD analyzer for proper operation
  • Segmented Processing: Transcribes complete utterances, not continuous streams
  • OpenAI Compatibility: Uses OpenAI-compatible API interface
  • Language Detection: Automatic language detection when no language is specified