Azure

Overview

AzureSTTService provides real-time speech recognition using Azure’s Cognitive Services Speech SDK with support for continuous recognition, extensive language support, and configurable audio processing.

API Reference

Complete API documentation and method details

Azure Speech Docs

Official Azure Speech Service documentation and features

Example Code

Working example with Azure services integration

Installation

To use Azure Speech services, install the required dependency:

pip install "pipecat-ai[azure]"

You’ll also need to set up your Azure credentials as environment variables:

AZURE_API_KEY (or AZURE_SPEECH_API_KEY)
AZURE_REGION (or AZURE_SPEECH_REGION)

Get your API key and region from the Azure Portal by creating a Speech Services resource.

Frames

Input

InputAudioRawFrame - Raw PCM audio data (configurable sample rate, mono or stereo)
STTUpdateSettingsFrame - Runtime transcription configuration updates
STTMuteFrame - Mute audio input for transcription

Output

TranscriptionFrame - Final transcription results
ErrorFrame - Connection or processing errors

Language Support

Azure Speech STT supports extensive language coverage with regional variants:

View All Supported Languages

Language Code	Description	Service Codes
`Language.AF`	Afrikaans	`af-ZA`
`Language.AM`	Amharic	`am-ET`
`Language.AR`	Arabic (UAE)	`ar-AE`
`Language.AR_SA`	Arabic (Saudi Arabia)	`ar-SA`
`Language.AR_EG`	Arabic (Egypt)	`ar-EG`
`Language.AS`	Assamese	`as-IN`
`Language.AZ`	Azerbaijani	`az-AZ`
`Language.BG`	Bulgarian	`bg-BG`
`Language.BN`	Bengali	`bn-IN`
`Language.BS`	Bosnian	`bs-BA`
`Language.CA`	Catalan	`ca-ES`
`Language.CS`	Czech	`cs-CZ`
`Language.CY`	Welsh	`cy-GB`
`Language.DA`	Danish	`da-DK`
`Language.DE`	German	`de-DE`
`Language.DE_AT`	German (Austria)	`de-AT`
`Language.DE_CH`	German (Switzerland)	`de-CH`
`Language.EL`	Greek	`el-GR`
`Language.EN`	English (US)	`en-US`
`Language.EN_AU`	English (Australia)	`en-AU`
`Language.EN_CA`	English (Canada)	`en-CA`
`Language.EN_GB`	English (UK)	`en-GB`
`Language.EN_IN`	English (India)	`en-IN`
`Language.ES`	Spanish (Spain)	`es-ES`
`Language.ES_MX`	Spanish (Mexico)	`es-MX`
`Language.ES_US`	Spanish (US)	`es-US`
`Language.ET`	Estonian	`et-EE`
`Language.EU`	Basque	`eu-ES`
`Language.FA`	Persian	`fa-IR`
`Language.FI`	Finnish	`fi-FI`
`Language.FIL`	Filipino	`fil-PH`
`Language.FR`	French	`fr-FR`
`Language.FR_CA`	French (Canada)	`fr-CA`
`Language.GA`	Irish	`ga-IE`
`Language.GL`	Galician	`gl-ES`
`Language.GU`	Gujarati	`gu-IN`
`Language.HE`	Hebrew	`he-IL`
`Language.HI`	Hindi	`hi-IN`
`Language.HR`	Croatian	`hr-HR`
`Language.HU`	Hungarian	`hu-HU`
`Language.HY`	Armenian	`hy-AM`
`Language.ID`	Indonesian	`id-ID`
`Language.IS`	Icelandic	`is-IS`
`Language.IT`	Italian	`it-IT`
`Language.JA`	Japanese	`ja-JP`
`Language.JV`	Javanese	`jv-ID`
`Language.KA`	Georgian	`ka-GE`
`Language.KK`	Kazakh	`kk-KZ`
`Language.KM`	Khmer	`km-KH`
`Language.KN`	Kannada	`kn-IN`
`Language.KO`	Korean	`ko-KR`
`Language.LO`	Lao	`lo-LA`
`Language.LT`	Lithuanian	`lt-LT`
`Language.LV`	Latvian	`lv-LV`
`Language.MK`	Macedonian	`mk-MK`
`Language.ML`	Malayalam	`ml-IN`
`Language.MN`	Mongolian	`mn-MN`
`Language.MR`	Marathi	`mr-IN`
`Language.MS`	Malay	`ms-MY`
`Language.MT`	Maltese	`mt-MT`
`Language.MY`	Burmese	`my-MM`
`Language.NB`	Norwegian	`nb-NO`
`Language.NE`	Nepali	`ne-NP`
`Language.NL`	Dutch	`nl-NL`
`Language.OR`	Odia	`or-IN`
`Language.PA`	Punjabi	`pa-IN`
`Language.PL`	Polish	`pl-PL`
`Language.PS`	Pashto	`ps-AF`
`Language.PT`	Portuguese	`pt-PT`
`Language.PT_BR`	Portuguese (Brazil)	`pt-BR`
`Language.RO`	Romanian	`ro-RO`
`Language.RU`	Russian	`ru-RU`
`Language.SI`	Sinhala	`si-LK`
`Language.SK`	Slovak	`sk-SK`
`Language.SL`	Slovenian	`sl-SI`
`Language.SO`	Somali	`so-SO`
`Language.SQ`	Albanian	`sq-AL`
`Language.SR`	Serbian	`sr-RS`
`Language.SU`	Sundanese	`su-ID`
`Language.SV`	Swedish	`sv-SE`
`Language.SW`	Swahili	`sw-KE`
`Language.TA`	Tamil	`ta-IN`
`Language.TE`	Telugu	`te-IN`
`Language.TH`	Thai	`th-TH`
`Language.TR`	Turkish	`tr-TR`
`Language.UK`	Ukrainian	`uk-UA`
`Language.UR`	Urdu	`ur-IN`
`Language.UZ`	Uzbek	`uz-UZ`
`Language.VI`	Vietnamese	`vi-VN`
`Language.ZH`	Chinese (Mandarin)	`zh-CN`
`Language.ZH_HK`	Chinese (Hong Kong)	`zh-HK`
`Language.ZH_TW`	Chinese (Taiwan)	`zh-TW`
`Language.ZU`	Zulu	`zu-ZA`

Common languages:

Language.EN_US - English (US) - en-US
Language.ES - Spanish - es-ES
Language.FR - French - fr-FR
Language.DE - German - de-DE
Language.IT - Italian - it-IT
Language.JA - Japanese - ja-JP

Usage Example

Basic Configuration

Initialize the AzureSTTService and use it in a pipeline:

from pipecat.services.azure.stt import AzureSTTService
from pipecat.transcriptions.language import Language

# Basic configuration
stt = AzureSTTService(
    api_key=os.getenv("AZURE_SPEECH_API_KEY"),
    region=os.getenv("AZURE_SPEECH_REGION"),
    language=Language.EN_US,
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

Dynamic Configuration

Make settings updates by pushing an STTUpdateSettingsFrame for the AzureSTTService:

from pipecat.frames.frames import STTUpdateSettingsFrame

await task.queue_frame(STTUpdateSettingsFrame(
    language=Language.FR,
))

Metrics

The service provides:

Time to First Byte (TTFB) - Latency from audio input to first transcription
Processing Duration - Total time spent processing audio

Learn how to enable Metrics in your Pipeline.

Additional Notes

Continuous Recognition: Uses Azure’s continuous recognition mode for real-time processing
Audio Flexibility: Supports configurable sample rates and both mono/stereo input
Resource Management: Automatic cleanup of Azure speech recognizer and audio streams
Threading: Thread-safe operation with proper async event loop handling using asyncio.run_coroutine_threadsafe
Regional Support: Requires Azure region specification for optimal performance and compliance
Connection Management: Handles Azure SDK connection lifecycle with proper start/stop/cancel operations

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

API Reference

Azure Speech Docs

Example Code

Installation

Frames

Input

Output

Language Support

Usage Example

Basic Configuration

Dynamic Configuration

Metrics

Additional Notes

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

API Reference

Azure Speech Docs

Example Code

​Installation

​Frames

​Input

​Output

​Language Support

​Usage Example

​Basic Configuration

​Dynamic Configuration

​Metrics

​Additional Notes

Overview

Installation

Frames

Input

Output

Language Support

Usage Example

Basic Configuration

Dynamic Configuration

Metrics

Additional Notes