Speech-to-Text
Azure
Speech-to-text service using Azure Cognitive Services Speech SDK
Overview
AzureSTTService
provides real-time speech recognition using Azure’s Cognitive Services Speech SDK. It supports continuous recognition and multiple languages.
Installation
To use AzureSTTService
, install the required dependencies:
You’ll also need to set up the following environment variables:
AZURE_API_KEY
AZURE_REGION
Configuration
Constructor Parameters
Azure Speech Service API key
Azure region identifier
Recognition language
Input audio sample rate in Hz
Number of audio channels
Input
The service processes audio data through a PushAudioInputStream
:
- PCM format
- Configurable sample rate
- Mono or stereo input
Output Frames
Contains: - Recognized text - Empty user ID - ISO 8601 formatted timestamp
Methods
See the STT base class methods for additional functionality.
Language Setting
Language Support
Azure STT supports the following languages and regional variants:
Language Code | Description | Service Codes |
---|---|---|
Language.ZH | Chinese | zh-CN |
Language.EN_US | English (US) | en-US |
Language.EN_IN | English (India) | en-IN |
Language.FR | French | fr-FR |
Language.DE | German | de-DE |
Language.HI | Hindi | hi-IN |
Language.IT | Italian | it-IT |
Language.JA | Japanese | ja-JP |
Language.KO | Korean | ko-KR |
Language.PT_BR | Portuguese (Brazil) | pt-BR |
Language.ES | Spanish | es-ES , es-MX |
Usage Example
Frame Flow
Notes
- Supports continuous recognition
- Handles automatic reconnection
- Provides real-time transcription
- Thread-safe processing
- Automatic resource cleanup