Rime
Text-to-speech service implementations using Rime AI
Overview
Rime AI provides two TTS service implementations:
RimeTTSService
: WebSocket-based implementation with word-level timing and interruption supportRimeHttpTTSService
: HTTP-based implementation for simpler use cases
Installation
To use Rime services, install the required dependencies:
You can obtain a Rime API key by signing up at Rime.
Choosing a Rime service
Rime has two supported services:
RimeTTSService
which is a websocket-based implementationRimeHttpTTSService
, which is an HTTP-based implementation
RimeTTSService
The RimeTTSService
is recommended for real-time interactive applications. It offers:
- Word-level timing information for precise synchronization
- Support for interruptions and context management
- Context tracking across multiple messages within a turn
- Non-blocking operation that allows other frames to be processed while audio is being generated
RimeHttpTTSService
The RimeHttpTTSService
is simpler and more suitable for non-interactive use cases. It:
- Processes the entire text in one request
- Supports advanced text control features (pauses, phonemes, inline speed)
- Blocks during the HTTP request, preventing other frames from being processed until the audio is fully generated
Input Parameters
Both services use the same base input parameters structure:
The language to use for synthesis. See Language Support section for available options.
Speech rate multiplier. Values less than 1.0 increase speed, values greater than 1.0 decrease speed.
Trade accuracy for lower latency
When set to true
, adds pauses between words enclosed in angle brackets. The number inside the brackets specifies the pause duration in milliseconds.
Example: "Hi. <200> I'd love to have a conversation with you."
adds a 200ms pause between the first and second sentences.
When set to true, you can specify the phonemes for a word enclosed in curly brackets.
Example: "{h'El.o} World" will pronounce "Hello"
as expected. See Rime’s docs for more details.
RimeTTSService (WebSocket)
Uses Rime’s WebSocket JSON API for real-time speech synthesis with word-level timing information.
Constructor Parameters
Rime API key
Rime voice identifier
Rime WebSocket API endpoint
Model ID to use for synthesis
Output audio sample rate in Hz
Speech generation parameters (see Input Parameters section above)
Text aggregator for processing input text. Defaults to skipping content
between spell(
and )
tags.
Features
- Word-level timing information with cumulative timing across messages
- Support for interruptions with context clearing
- Context tracking across multiple messages within a turn
- Real-time audio streaming
- Proper sentence aggregation with skip tags support
RimeHttpTTSService (HTTP)
HTTP-based implementation for simpler synthesis requirements.
Constructor Parameters
Rime API key
Rime voice identifier. See Rime’s documentation for supported voices.
HTTP session for making requests
Choose mistv2
for hyper-realistic conversational voices, mist
for Rime’s
previous generation model, or the latest arcana
model.
Output audio sample rate in Hz
Speech generation parameters
Output Frames
Both services generate the following frames:
Control Frames
Signals start of speech synthesis
Signals completion of speech synthesis
Audio Frames
Contains generated audio data: PCM audio format, specified sample rate, single channel (mono)
Error Frames
Contains Rime TTS error information
Language Support
Supports multiple languages through the Language
enum:
Language Code | Description | Service Code |
---|---|---|
Language.DE | German | ger |
Language.EN | English | eng |
Language.ES | Spanish | spa |
Language.FR | French | fra |
Usage Examples
WebSocket Service
HTTP Service
Frame Flow
Metrics Support
Both services collect processing metrics:
- Time to First Byte (TTFB)
- Character usage statistics
Service Comparison
Feature | WebSocket | HTTP |
---|---|---|
Word timing | ✓ | - |
Interruption support | ✓ | - |
Context tracking | ✓ | - |
Bracket-based pauses | ✓ | ✓ |
Phoneme control | ✓ | ✓ |
Inline speed control | - | ✓ |
Streaming audio | ✓ | ✓ |
Arcana model support | - | ✓ |