Skip to main content

Overview

SimplismartSTTService is a segmented speech-to-text service that POSTs WAV audio segments to the Simplismart HTTP /predict endpoint and emits TranscriptionFrames. It requires upstream VAD (a VADProcessor or transport/user-aggregator VAD) so speech segments are delimited before transcription.

Source Repository

Source code, examples, and issues for the Simplismart integration

Simplismart

Learn more about Simplismart’s AI platform

Installation

This is a community-maintained package distributed separately from pipecat-ai. It is not published to PyPI, so install it from source:
uv pip install git+https://github.com/simpli-smart/pipecat-simplismart.git

Prerequisites

Simplismart Account Setup

Before using the Simplismart STT service, you need a Simplismart account and an API key. See Simplismart to get started.

Required Environment Variables

  • SIMPLISMART_API_KEY: Bearer token used to authenticate requests. May be passed directly via the api_key constructor argument instead.
  • SIMPLISMART_STT_URL (optional): Full URL for the STT endpoint. Defaults to https://api.simplismart.live/predict.

Configuration

api_key
str
default:"None"
Bearer token. Falls back to the SIMPLISMART_API_KEY environment variable if not provided.
base_url
str
default:"None"
Full URL to the predict endpoint. Falls back to the SIMPLISMART_STT_URL environment variable, then to https://api.simplismart.live/predict.
aiohttp_session
aiohttp.ClientSession
default:"None"
Optional shared aiohttp session. If not provided, the service creates and owns its own session.
sample_rate
int
default:"None"
Input audio sample rate. Usually supplied by the pipeline StartFrame.
settings
SimplismartSTTService.Settings
default:"None"
Runtime-configurable STT settings. See Settings below.

Settings

Runtime-configurable settings passed via the settings constructor argument using SimplismartSTTService.Settings(...). The settings dataclass extends Pipecat’s common STTSettings (which includes model and language).
ParameterTypeDefaultDescription
vad_filterboolTrueEnable server-side VAD filtering when supported.
vad_onsetfloat0.5VAD onset threshold.
beam_sizeint4Beam search size for decoding.
temperaturefloat0.0Decoding temperature.
strict_hallucination_reductionboolTrueAsk the server to apply extra anti-hallucination logic (Whisper).
The default model is openai/whisper-large-v3-turbo and the default language is Language.EN.
See the source repository for the authoritative, up-to-date list of settings and defaults.

Usage

Place a VADProcessor before SimplismartSTTService so VAD events reach the segmented STT layer.
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.processors.audio.vad_processor import VADProcessor
from pipecat_simplismart import SimplismartSTTService

vad_processor = VADProcessor(vad_analyzer=SileroVADAnalyzer())
stt = SimplismartSTTService(
    api_key="YOUR_KEY",
    base_url="https://api.simplismart.live/predict",
)

# pipeline: ... transport.input(), vad_processor, stt, ...
The service outputs TranscriptionFrames.

Compatibility

Tested with Pipecat v1.1.0 (pipecat-ai>=0.0.86). Check the source repository for the latest tested version and changelog.