Speech-to-Text
Azure
Speech-to-text service using Azure Cognitive Services Speech SDK
Overview
AzureSTTService
provides real-time speech recognition using Azure’s Cognitive Services Speech SDK. It supports continuous recognition and multiple languages.
Installation
To use AzureSTTService
, install the required dependencies:
You’ll also need to set up the following environment variables:
AZURE_API_KEY
AZURE_REGION
Configuration
Constructor Parameters
api_key
str
requiredAzure Speech Service API key
region
str
requiredAzure region identifier
language
Language
default: "Language.EN_US"Recognition language
sample_rate
int
default: "24000"Input audio sample rate in Hz
channels
int
default: "1"Number of audio channels
Input
The service processes audio data through a PushAudioInputStream
:
- PCM format
- Configurable sample rate
- Mono or stereo input
Output Frames
TranscriptionFrame
Frame
Contains: - Recognized text - Empty user ID - ISO 8601 formatted timestamp
Methods
See the STT base class methods for additional functionality.
Language Setting
Language Support
Azure STT supports the following languages and regional variants:
Language Code | Description | Service Codes |
---|---|---|
Language.ZH | Chinese | zh-CN |
Language.EN_US | English (US) | en-US |
Language.EN_IN | English (India) | en-IN |
Language.FR | French | fr-FR |
Language.DE | German | de-DE |
Language.HI | Hindi | hi-IN |
Language.IT | Italian | it-IT |
Language.JA | Japanese | ja-JP |
Language.KO | Korean | ko-KR |
Language.PT_BR | Portuguese (Brazil) | pt-BR |
Language.ES | Spanish | es-ES , es-MX |
Usage Example
Frame Flow
Notes
- Supports continuous recognition
- Handles automatic reconnection
- Provides real-time transcription
- Thread-safe processing
- Automatic resource cleanup