Rime
Text-to-speech service implementations using Rime AI
Overview
Rime AI’s text-to-speech capabilities are available through two service implementations:
RimeTTSService
: WebSocket-based implementation with word-level timing and interruption supportRimeHttpTTSService
: HTTP-based implementation for simpler use cases
You can obtain a Rime API key by signing up at Rime.
RimeTTSService (WebSocket Service)
Uses Rime’s WebSocket JSON API for real-time speech synthesis with word-level timing information.
Constructor Parameters
Rime API key
Rime voice identifier
Rime WebSocket API endpoint
Model ID to use for synthesis
Output audio sample rate in Hz
Speech generation parameters
Features
- Word-level timing information
- Support for interruptions
- Context tracking across multiple messages
- Real-time audio streaming
- Proper sentence aggregation
RimeHttpTTSService (HTTP Service)
Constructor Parameters
Rime API key
Rime voice identifier. See Rime’s documentation for supported voices.
Choose mist
for hyper-realistic conversational voices or v1
for Rime’s
first-gen model.
Output audio sample rate in Hz
Speech generation parameters
Output Frames
Both services generate the following frames:
Control Frames
Signals start of speech synthesis
Signals completion of speech synthesis
Audio Frames
Contains generated audio data: - PCM audio format - Specified sample rate - Single channel (mono)
Text Frames (WebSocket only)
Contains word-level text with timing information
Error Frames
Contains Rime TTS error information
Usage Example
Frame Flow
Metrics Support
Both services collect processing metrics:
- Time to First Byte (TTFB)
- Character usage statistics
Service Comparison
Feature | WebSocket | HTTP |
---|---|---|
Word timing | ✓ | - |
Interruption support | ✓ | - |
Bracket-based pauses | - | ✓ |
Phoneme control | - | ✓ |
Inline speed control | - | ✓ |
Streaming audio | ✓ | ✓ |