Skip to main content

Overview

AzureSTTService provides real-time speech recognition using Azure’s Cognitive Services Speech SDK with support for continuous recognition, extensive language support, and configurable audio processing for enterprise applications.

Installation

To use Azure Speech services, install the required dependency:
pip install "pipecat-ai[azure]"

Prerequisites

Azure Account Setup

Before using Azure STT services, you need:
  1. Azure Account: Sign up at Azure Portal
  2. Speech Services Resource: Create a Speech Services resource in Azure
  3. API Credentials: Get your API key and region from the resource

Required Environment Variables

  • AZURE_SPEECH_API_KEY: Your Azure Speech API key
  • AZURE_SPEECH_REGION: Your Azure Speech region

Configuration

api_key
str
required
Azure Cognitive Services subscription key.
region
str
required
Azure region for the Speech service (e.g., "eastus", "westus2").
language
Language
default:"Language.EN_US"
Language for speech recognition.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
endpoint_id
str
default:"None"
Custom model endpoint ID. Use this for custom speech models deployed in Azure.
ttfs_p99_latency
float
default:"AZURE_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment.

Usage

Basic Setup

from pipecat.services.azure import AzureSTTService

stt = AzureSTTService(
    api_key=os.getenv("AZURE_SPEECH_API_KEY"),
    region=os.getenv("AZURE_SPEECH_REGION"),
)

With Custom Language

from pipecat.services.azure import AzureSTTService
from pipecat.transcriptions.language import Language

stt = AzureSTTService(
    api_key=os.getenv("AZURE_SPEECH_API_KEY"),
    region="westus2",
    language=Language.FR,
)

Notes

  • SDK-based (not WebSocket): Unlike most other STT services in Pipecat, Azure STT uses the Azure Cognitive Services Speech SDK rather than a raw WebSocket connection. Recognition callbacks run on SDK-managed threads and are bridged to asyncio via asyncio.run_coroutine_threadsafe.
  • Continuous recognition: The service uses Azure’s start_continuous_recognition_async for always-on transcription. It provides both interim (recognizing) and final (recognized) results automatically.
  • Custom endpoints: Use the endpoint_id parameter to point to a custom speech model deployed in your Azure subscription for domain-specific accuracy improvements.