Skip to main content

Overview

AzureSTTService provides real-time speech recognition using Azure’s Cognitive Services Speech SDK with support for continuous recognition, extensive language support, and configurable audio processing for enterprise applications.

Installation

To use Azure Speech services, install the required dependency:
pip install "pipecat-ai[azure]"

Prerequisites

Azure Account Setup

Before using Azure STT services, you need:
  1. Azure Account: Sign up at Azure Portal
  2. Speech Services Resource: Create a Speech Services resource in Azure
  3. API Credentials: Get your API key and region from the resource

Required Environment Variables

  • AZURE_SPEECH_API_KEY: Your Azure Speech API key
  • AZURE_SPEECH_REGION: Your Azure Speech region (required unless using private_endpoint)

Configuration

api_key
str
required
Azure Cognitive Services subscription key.
region
str
default:"None"
Azure region for the Speech service (e.g., "eastus", "westus2"). Required unless private_endpoint is provided.
language
Language
default:"Language.EN_US"
deprecated
Language for speech recognition. Deprecated in v0.0.105. Use settings=AzureSTTService.Settings(...) instead.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
private_endpoint
str
default:"None"
Private endpoint for STT behind firewall. Enables use in private networks. When provided, region becomes optional (takes priority if both are specified). See Azure Speech private link documentation for setup details.
endpoint_id
str
default:"None"
Custom model endpoint ID. Use this for custom speech models deployed in Azure.
settings
AzureSTTService.Settings
default:"None"
Runtime-configurable settings for the STT service. See Settings below.
ttfs_p99_latency
float
default:"AZURE_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment.

Settings

Runtime-configurable settings passed via the settings constructor argument using AzureSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneSTT model identifier. (Inherited from base STT settings.)
languageLanguage | strLanguage.EN_USLanguage for speech recognition. (Inherited from base STT settings.)

Usage

Basic Setup

from pipecat.services.azure.stt import AzureSTTService

stt = AzureSTTService(
    api_key=os.getenv("AZURE_SPEECH_API_KEY"),
    region=os.getenv("AZURE_SPEECH_REGION"),
)

With Custom Language

from pipecat.services.azure.stt import AzureSTTService
from pipecat.transcriptions.language import Language

stt = AzureSTTService(
    api_key=os.getenv("AZURE_SPEECH_API_KEY"),
    region="westus2",
    settings=AzureSTTService.Settings(
        language=Language.FR,
    ),
)
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Notes

  • SDK-based (not WebSocket): Unlike most other STT services in Pipecat, Azure STT uses the Azure Cognitive Services Speech SDK rather than a raw WebSocket connection. Recognition callbacks run on SDK-managed threads and are bridged to asyncio via asyncio.run_coroutine_threadsafe.
  • Continuous recognition: The service uses Azure’s start_continuous_recognition_async for always-on transcription. It provides both interim (recognizing) and final (recognized) results automatically.
  • Custom endpoints: Use the endpoint_id parameter to point to a custom speech model deployed in your Azure subscription for domain-specific accuracy improvements.
  • Region vs private endpoint: Either region or private_endpoint must be provided (but not both). If both are specified, private_endpoint takes priority and a warning is logged. If neither is provided, a ValueError is raised.