Overview
AzureSTTService provides real-time speech recognition using Azure’s Cognitive Services Speech SDK with support for continuous recognition, extensive language support, and configurable audio processing for enterprise applications.
Azure STT API Reference
Pipecat’s API methods for Azure Speech integration
Example Implementation
Complete example with Azure services integration
Azure Speech Documentation
Official Azure Speech Service documentation and features
Azure Portal
Create Speech Services resource and get API keys
Installation
To use Azure Speech services, install the required dependency:Prerequisites
Azure Account Setup
Before using Azure STT services, you need:- Azure Account: Sign up at Azure Portal
- Speech Services Resource: Create a Speech Services resource in Azure
- API Credentials: Get your API key and region from the resource
Required Environment Variables
AZURE_SPEECH_API_KEY: Your Azure Speech API keyAZURE_SPEECH_REGION: Your Azure Speech region (required unless usingprivate_endpoint)
Configuration
Azure Cognitive Services subscription key.
Azure region for the Speech service (e.g.,
"eastus", "westus2"). Required
unless private_endpoint is provided.Language for speech recognition. Deprecated in v0.0.105. Use
settings=AzureSTTService.Settings(...) instead.Audio sample rate in Hz. When
None, uses the pipeline’s configured sample
rate.Private endpoint for STT behind firewall. Enables use in private networks.
When provided,
region becomes optional (takes priority if both are
specified). See Azure Speech private link
documentation
for setup details.Custom model endpoint ID. Use this for custom speech models deployed in Azure.
Runtime-configurable settings for the STT service. See Settings
below.
P99 latency from speech end to final transcript in seconds. Override for your
deployment.
Settings
Runtime-configurable settings passed via thesettings constructor argument using AzureSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | None | STT model identifier. (Inherited from base STT settings.) |
language | Language | str | Language.EN_US | Language for speech recognition. (Inherited from base STT settings.) |
Usage
Basic Setup
With Custom Language
Notes
- SDK-based (not WebSocket): Unlike most other STT services in Pipecat, Azure STT uses the Azure Cognitive Services Speech SDK rather than a raw WebSocket connection. Recognition callbacks run on SDK-managed threads and are bridged to asyncio via
asyncio.run_coroutine_threadsafe. - Continuous recognition: The service uses Azure’s
start_continuous_recognition_asyncfor always-on transcription. It provides both interim (recognizing) and final (recognized) results automatically. - Custom endpoints: Use the
endpoint_idparameter to point to a custom speech model deployed in your Azure subscription for domain-specific accuracy improvements. - Region vs private endpoint: Either
regionorprivate_endpointmust be provided (but not both). If both are specified,private_endpointtakes priority and a warning is logged. If neither is provided, aValueErroris raised.