Overview

AzureSTTService provides real-time speech recognition using Azure’s Cognitive Services Speech SDK. It supports continuous recognition and multiple languages.

Installation

To use AzureSTTService, install the required dependencies:

pip install pipecat-ai[azure]

You’ll also need to set up the following environment variables:

  • AZURE_API_KEY
  • AZURE_REGION

Configuration

Constructor Parameters

api_key
str
required

Azure Speech Service API key

region
str
required

Azure region identifier

language
Language
default:
"Language.EN_US"

Recognition language

sample_rate
int
default:
"24000"

Input audio sample rate in Hz

channels
int
default:
"1"

Number of audio channels

Input

The service processes audio data through a PushAudioInputStream:

  • PCM format
  • Configurable sample rate
  • Mono or stereo input

Output Frames

TranscriptionFrame
Frame

Contains: - Recognized text - Empty user ID - ISO 8601 formatted timestamp

Methods

See the STT base class methods for additional functionality.

Language Setting

await service.set_language(Language.FR)

Language Support

Azure STT supports the following languages and regional variants:

Language CodeDescriptionService Codes
Language.ZHChinesezh-CN
Language.EN_USEnglish (US)en-US
Language.EN_INEnglish (India)en-IN
Language.FRFrenchfr-FR
Language.DEGermande-DE
Language.HIHindihi-IN
Language.ITItalianit-IT
Language.JAJapaneseja-JP
Language.KOKoreanko-KR
Language.PT_BRPortuguese (Brazil)pt-BR
Language.ESSpanishes-ES, es-MX

Usage Example

# Configure service
stt_service = AzureSTTService(
    api_key="your-api-key",
    region="eastus",
    language=Language.EN_US,
    sample_rate=16000,
    channels=1
)

# Use in pipeline
pipeline = Pipeline([
    audio_input,
    stt_service,
    text_handler
])

Frame Flow

Notes

  • Supports continuous recognition
  • Handles automatic reconnection
  • Provides real-time transcription
  • Thread-safe processing
  • Automatic resource cleanup