Overview

Neuphonic provides high-quality text-to-speech synthesis through two service implementations:

  • NeuphonicTTSService: WebSocket-based implementation with interruption support
  • NeuphonicHttpTTSService: HTTP-based implementation for simpler use cases

Both services support various voices, languages, and customization options.

Installation

To use Neuphonic TTS services, install the required dependencies:

pip install "pipecat-ai[neuphonic]"

You’ll also need to set up your Neuphonic API key as an environment variable: NEUPHONIC_API_KEY

NeuphonicTTSService (WebSocket)

Configuration

api_key
str
required

Your Neuphonic API key

voice_id
str
default:"None"

Voice identifier to use for synthesis

url
str
default:"wss://api.neuphonic.com"

Neuphonic WebSocket API endpoint

sample_rate
int
default:"22050"

Output audio sample rate in Hz

encoding
str
default:"pcm_linear"

Audio encoding format

params
InputParams
default:"InputParams()"

Additional configuration parameters

InputParams

language
Language
default:"Language.EN"

The language for TTS generation

speed
float
default:"1.0"

Speech speed multiplier (0.5-2.0)

NeuphonicHttpTTSService (HTTP)

Configuration

api_key
str
required

Your Neuphonic API key

voice_id
str
default:"None"

Voice identifier to use for synthesis

url
str
default:"https://api.neuphonic.com"

Neuphonic HTTP API endpoint

sample_rate
int
default:"22050"

Output audio sample rate in Hz

encoding
str
default:"pcm_linear"

Audio encoding format

params
InputParams
default:"InputParams()"

Additional configuration parameters (same as WebSocket implementation)

Input

Both services accept text input through their TTS pipeline.

Output Frames

TTSStartedFrame

Signals the start of audio generation.

TTSAudioRawFrame

Contains generated audio data:

audio
bytes

Raw audio data chunk

sample_rate
int

Audio sample rate (22050Hz default)

num_channels
int

Number of audio channels (1 for mono)

TTSStoppedFrame

Signals the completion of audio generation.

ErrorFrame

Sent when an error occurs during TTS generation:

error
str

Error message describing what went wrong

Methods

WebSocket Implementation

The WebSocket implementation (NeuphonicTTSService) inherits from InterruptibleTTSService and provides:

  • Support for interrupting ongoing TTS generation
  • Automatic websocket connection management
  • Keep-alive mechanism for persistent connections
  • Special handling for conversation flows

HTTP Implementation

The HTTP implementation (NeuphonicHttpTTSService) inherits from TTSService and provides:

  • Simpler API integration using HTTP streaming
  • Less overhead for single TTS requests
  • Simplified error handling

Language Support

Neuphonic TTS supports the following languages:

Language CodeDescriptionService Codes
Language.ENEnglishen
Language.ESSpanishes
Language.DEGermande
Language.NLDutchnl
Language.ARArabicar
Language.FRFrenchfr
Language.PTPortuguesept
Language.RURussianru
Language.HIHindihi
Language.ZHChinesezh

Regional variants (e.g., EN_US, ES_ES) are automatically mapped to their base language.

Usage Example

WebSocket Implementation

from pipecat.services.neuphonic import NeuphonicTTSService
from pipecat.transcriptions.language import Language

# Configure service
tts = NeuphonicTTSService(
    api_key="your-neuphonic-api-key",
    voice_id="preferred-voice-id",
    params=NeuphonicTTSService.InputParams(
        language=Language.EN,
        speed=1.2
    )
)

# Use in pipeline
pipeline = Pipeline([
    ...,
    llm,
    tts,
    transport.output(),
])

HTTP Implementation

from pipecat.services.neuphonic import NeuphonicHttpTTSService
from pipecat.transcriptions.language import Language

# Configure service
tts = NeuphonicHttpTTSService(
    api_key="your-neuphonic-api-key",
    voice_id="preferred-voice-id",
    params=NeuphonicHttpTTSService.InputParams(
        language=Language.ES,
        speed=1.0
    )
)

# Use in pipeline
pipeline = Pipeline([
    ...,
    llm,
    tts,
    transport.output(),
])

Metrics Support

Both services support metrics collection:

  • Time to First Byte (TTFB)
  • TTS usage metrics
  • Processing duration

Audio Processing

  • Configurable sample rate (defaults to 22050Hz)
  • PCM linear encoding
  • Single channel (mono) output
  • Base64 decoding for audio data

Error Handling

try:
    # Generate speech
    async for frame in service.run_tts(text):
        if isinstance(frame, ErrorFrame):
            handle_error(frame.error)
except Exception as e:
    logger.error(f"TTS error: {e}")

Notes

  • WebSocket implementation includes a keep-alive mechanism (10-second interval)
  • WebSocket service maintains a persistent connection for faster responses
  • Both services automatically select appropriate language codes
  • The WebSocket implementation pauses frame processing during speech generation to prevent overlapping responses