Neuphonic
Text-to-speech service implementation using Neuphonic’s API
Overview
Neuphonic provides high-quality text-to-speech synthesis through two service implementations:
NeuphonicTTSService
: WebSocket-based implementation with interruption supportNeuphonicHttpTTSService
: HTTP-based implementation for simpler use cases
Both services support various voices, languages, and customization options.
Installation
To use Neuphonic TTS services, install the required dependencies:
You’ll also need to set up your Neuphonic API key as an environment variable: NEUPHONIC_API_KEY
NeuphonicTTSService (WebSocket)
Configuration
Your Neuphonic API key
Voice identifier to use for synthesis
Neuphonic WebSocket API endpoint
Output audio sample rate in Hz
Audio encoding format
Additional configuration parameters
InputParams
The language for TTS generation
Speech speed multiplier (0.5-2.0)
NeuphonicHttpTTSService (HTTP)
Configuration
Your Neuphonic API key
Voice identifier to use for synthesis
Neuphonic HTTP API endpoint
Output audio sample rate in Hz
Audio encoding format
Additional configuration parameters (same as WebSocket implementation)
Input
Both services accept text input through their TTS pipeline.
Output Frames
TTSStartedFrame
Signals the start of audio generation.
TTSAudioRawFrame
Contains generated audio data:
Raw audio data chunk
Audio sample rate (22050Hz default)
Number of audio channels (1 for mono)
TTSStoppedFrame
Signals the completion of audio generation.
ErrorFrame
Sent when an error occurs during TTS generation:
Error message describing what went wrong
Methods
WebSocket Implementation
The WebSocket implementation (NeuphonicTTSService
) inherits from InterruptibleTTSService
and provides:
- Support for interrupting ongoing TTS generation
- Automatic websocket connection management
- Keep-alive mechanism for persistent connections
- Special handling for conversation flows
HTTP Implementation
The HTTP implementation (NeuphonicHttpTTSService
) inherits from TTSService
and provides:
- Simpler API integration using HTTP streaming
- Less overhead for single TTS requests
- Simplified error handling
Language Support
Neuphonic TTS supports the following languages:
Language Code | Description | Service Codes |
---|---|---|
Language.EN | English | en |
Language.ES | Spanish | es |
Language.DE | German | de |
Language.NL | Dutch | nl |
Language.AR | Arabic | ar |
Language.FR | French | fr |
Language.PT | Portuguese | pt |
Language.RU | Russian | ru |
Language.HI | Hindi | hi |
Language.ZH | Chinese | zh |
Regional variants (e.g., EN_US
, ES_ES
) are automatically mapped to their base language.
Usage Example
WebSocket Implementation
HTTP Implementation
Metrics Support
Both services support metrics collection:
- Time to First Byte (TTFB)
- TTS usage metrics
- Processing duration
Audio Processing
- Configurable sample rate (defaults to 22050Hz)
- PCM linear encoding
- Single channel (mono) output
- Base64 decoding for audio data
Error Handling
Notes
- WebSocket implementation includes a keep-alive mechanism (10-second interval)
- WebSocket service maintains a persistent connection for faster responses
- Both services automatically select appropriate language codes
- The WebSocket implementation pauses frame processing during speech generation to prevent overlapping responses