Overview
AzureSTTService provides real-time speech recognition using Azure’s Cognitive Services Speech SDK with support for continuous recognition, extensive language support, and configurable audio processing for enterprise applications.
Azure STT API Reference
Pipecat’s API methods for Azure Speech integration
Example Implementation
Complete example with Azure services integration
Azure Speech Documentation
Official Azure Speech Service documentation and features
Azure Portal
Create Speech Services resource and get API keys
Installation
To use Azure Speech services, install the required dependency:Prerequisites
Azure Account Setup
Before using Azure STT services, you need:- Azure Account: Sign up at Azure Portal
- Speech Services Resource: Create a Speech Services resource in Azure
- API Credentials: Get your API key and region from the resource
Required Environment Variables
AZURE_SPEECH_API_KEY: Your Azure Speech API keyAZURE_SPEECH_REGION: Your Azure Speech region
Configuration
Azure Cognitive Services subscription key.
Azure region for the Speech service (e.g.,
"eastus", "westus2").Language for speech recognition.
Audio sample rate in Hz. When
None, uses the pipeline’s configured sample rate.Custom model endpoint ID. Use this for custom speech models deployed in Azure.
P99 latency from speech end to final transcript in seconds. Override for your deployment.
Usage
Basic Setup
With Custom Language
Notes
- SDK-based (not WebSocket): Unlike most other STT services in Pipecat, Azure STT uses the Azure Cognitive Services Speech SDK rather than a raw WebSocket connection. Recognition callbacks run on SDK-managed threads and are bridged to asyncio via
asyncio.run_coroutine_threadsafe. - Continuous recognition: The service uses Azure’s
start_continuous_recognition_asyncfor always-on transcription. It provides both interim (recognizing) and final (recognized) results automatically. - Custom endpoints: Use the
endpoint_idparameter to point to a custom speech model deployed in your Azure subscription for domain-specific accuracy improvements.