Overview
Rime AI provides two TTS service implementations:RimeTTSService (WebSocket-based) with word-level timing and interruption support, and RimeHttpTTSService (HTTP-based) for simpler use cases. RimeTTSService is recommended for real-time interactive applications.
Rime TTS API Reference
Pipecat’s API methods for Rime TTS integration
Example Implementation
Complete example with word timestamps
Rime Documentation
Official Rime WebSocket and HTTP API documentation
Voice Models
Explore available voice models and features
Installation
To use Rime services, install the required dependencies:Prerequisites
Rime Account Setup
Before using Rime TTS services, you need:- Rime Account: Sign up at Rime AI
- API Key: Generate an API key from your account dashboard
- Voice Selection: Choose from available voice models
Required Environment Variables
RIME_API_KEY: Your Rime API key for authentication
Configuration
RimeTTSService
Rime API key for authentication.
ID of the voice to use for synthesis.
Rime WebSocket API endpoint.
Model ID to use for synthesis.
Output audio sample rate in Hz. When
None, uses the pipeline’s configured sample rate.Buffer text until sentence boundaries before sending to Rime.
Runtime-configurable voice and generation settings. See InputParams (WebSocket) below.
RimeHttpTTSService
Rime API key for authentication.
ID of the voice to use for synthesis.
An aiohttp session for HTTP requests.
Model ID to use for synthesis.
Output audio sample rate in Hz. When
None, uses the pipeline’s configured sample rate.Runtime-configurable voice and generation settings. See InputParams (HTTP) below.
RimeNonJsonTTSService
A non-JSON WebSocket service for models like Arcana that use plain text messages.Rime API key for authentication.
ID of the voice to use for synthesis.
Rime WebSocket API endpoint.
Model ID to use for synthesis.
Audio output format.
Output audio sample rate in Hz. When
None, uses the pipeline’s configured sample rate.Buffer text until sentence boundaries before sending.
Runtime-configurable settings. See InputParams (Non-JSON) below.
InputParams (WebSocket)
| Parameter | Type | Default | Description |
|---|---|---|---|
language | Language | Language.EN | Language for synthesis. |
speed_alpha | float | 1.0 | Speech speed multiplier. |
reduce_latency | bool | False | Whether to reduce latency at potential quality cost. |
pause_between_brackets | bool | False | Whether to add pauses between bracketed content. |
phonemize_between_brackets | bool | False | Whether to phonemize bracketed content. |
InputParams (HTTP)
| Parameter | Type | Default | Description |
|---|---|---|---|
language | Language | Language.EN | Language for synthesis. |
speed_alpha | float | 1.0 | Speech speed multiplier. |
reduce_latency | bool | False | Whether to reduce latency at potential quality cost. |
pause_between_brackets | bool | False | Whether to add pauses between bracketed content. |
phonemize_between_brackets | bool | False | Whether to phonemize bracketed content. |
inline_speed_alpha | str | None | Inline speed control markup. |
InputParams (Non-JSON)
| Parameter | Type | Default | Description |
|---|---|---|---|
language | Language | None | Language for synthesis. |
segment | str | None | Text segmentation mode ("immediate", "bySentence", "never"). |
repetition_penalty | float | None | Token repetition penalty (1.0-2.0). |
temperature | float | None | Sampling temperature (0.0-1.0). |
top_p | float | None | Cumulative probability threshold (0.0-1.0). |
extra | dict | None | Additional parameters to pass to the API. |
Usage
Basic Setup (WebSocket)
With Customization (WebSocket)
HTTP Service
Non-JSON WebSocket (Arcana)
Customizing Speech
RimeTTSService provides a set of helper methods for implementing Rime-specific customizations, meant to be used as part of text transformers. These include methods for spelling out text, adjusting speech rate, and modifying pitch. See the Text Transformers for TTS section in the Text-to-Speech guide for usage examples.
SPELL(text: str) -> str:
Implements Rime’s spell function to spell out text character by character.PAUSE_TAG(seconds: float) -> str:
Implements Rime’s custom pause functionality to generate a properly formatted pause tag you can insert into the text.PRONOUNCE(self, text: str, word: str, phoneme: str) -> str:
Convenience method to support Rime’s custom pronunciations feature. It takes a word and its desired phoneme representation, returning the text with the provided word replaced by the appropriate phoneme tag.INLINE_SPEED(self, text: str, speed: float) -> str:
A convenience method to support Rime’s inline speed adjustment feature. It will wrap the provided text in the[] tags and add the provided speed to the inlineSpeedAlpha field in the request metadata.
Notes
- Word-level timestamps:
RimeTTSServiceprovides word-level timing information, enabling synchronized text highlighting. - WebSocket vs HTTP: The WebSocket service supports word-level timestamps, interruption handling, and maintains context across messages within a turn. The HTTP service is simpler but lacks these features.
- Non-JSON WebSocket:
RimeNonJsonTTSServiceis for models like Arcana that use plain text messages instead of JSON. It does not support word-level timestamps.
Event Handlers
Rime WebSocket TTS services support the standard service connection events:| Event | Description |
|---|---|
on_connected | Connected to Rime WebSocket |
on_disconnected | Disconnected from Rime WebSocket |
on_connection_error | WebSocket connection error occurred |