Overview
Rime AI provides two TTS service implementations:RimeTTSService (WebSocket-based) with word-level timing and interruption support, and RimeHttpTTSService (HTTP-based) for simpler use cases. RimeTTSService is recommended for real-time interactive applications.
Rime TTS API Reference
Pipecat’s API methods for Rime TTS integration
Example Implementation
Complete example with word timestamps
Rime Documentation
Official Rime WebSocket and HTTP API documentation
Voice Models
Explore available voice models and features
Installation
To use Rime services, install the required dependencies:Prerequisites
Rime Account Setup
Before using Rime TTS services, you need:- Rime Account: Sign up at Rime AI
- API Key: Generate an API key from your account dashboard
- Voice Selection: Choose from available voice models
Required Environment Variables
RIME_API_KEY: Your Rime API key for authentication
Configuration
RimeTTSService
Rime API key for authentication.
ID of the voice to use for synthesis. Deprecated in v0.0.105. Use
settings=RimeTTSService.Settings(voice=...) instead.Rime WebSocket API endpoint.
Model ID to use for synthesis. Deprecated in v0.0.105. Use
settings=RimeTTSService.Settings(model=...) instead.Output audio sample rate in Hz. When
None, uses the pipeline’s configured
sample rate.Controls how incoming text is aggregated before synthesis.
SENTENCE
(default) buffers text until sentence boundaries, producing more natural
speech. TOKEN streams tokens directly for lower latency. Import from
pipecat.services.tts_service.Deprecated in v0.0.104. Use
text_aggregation_mode instead.Deprecated in v0.0.105. Use
settings=RimeTTSService.Settings(...) instead.Runtime-configurable settings. See RimeTTSService
Settings below.
RimeHttpTTSService
Rime API key for authentication.
ID of the voice to use for synthesis. Deprecated in v0.0.105. Use
settings=RimeHttpTTSService.Settings(voice=...) instead.An aiohttp session for HTTP requests.
Model ID to use for synthesis. Deprecated in v0.0.105. Use
settings=RimeHttpTTSService.Settings(model=...) instead.Output audio sample rate in Hz. When
None, uses the pipeline’s configured
sample rate.Deprecated in v0.0.105. Use
settings=RimeHttpTTSService.Settings(...)
instead.Runtime-configurable settings. See RimeTTSService
Settings below.
RimeNonJsonTTSService
A non-JSON WebSocket service for models like Arcana that use plain text messages.Rime API key for authentication.
ID of the voice to use for synthesis. Deprecated in v0.0.105. Use
settings=RimeNonJsonTTSService.Settings(voice=...) instead.Rime WebSocket API endpoint.
Model ID to use for synthesis. Deprecated in v0.0.105. Use
settings=RimeNonJsonTTSService.Settings(model=...) instead.Audio output format.
Output audio sample rate in Hz. When
None, uses the pipeline’s configured
sample rate.Controls how incoming text is aggregated before synthesis.
SENTENCE
(default) buffers text until sentence boundaries. TOKEN streams tokens
directly for lower latency. Import from pipecat.services.tts_service.Deprecated in v0.0.104. Use
text_aggregation_mode instead.Deprecated in v0.0.105. Use
settings=RimeNonJsonTTSService.Settings(...)
instead.Runtime-configurable settings. See RimeNonJsonTTSService
Settings below.
RimeTTSService Settings
Runtime-configurable settings passed via thesettings constructor argument using RimeTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | None | Model identifier. (Inherited.) |
voice | str | None | Voice identifier. (Inherited.) |
language | Language | str | None | Language for synthesis. (Inherited.) |
segment | str | NOT_GIVEN | Segment type for synthesis. |
speedAlpha | float | NOT_GIVEN | Speed alpha parameter. |
reduceLatency | bool | NOT_GIVEN | Whether to reduce latency. |
pauseBetweenBrackets | bool | NOT_GIVEN | Pause between brackets. |
phonemizeBetweenBrackets | bool | NOT_GIVEN | Phonemize between brackets. |
noTextNormalization | bool | NOT_GIVEN | Disable text normalization. |
saveOovs | bool | NOT_GIVEN | Save out-of-vocabulary words. |
inlineSpeedAlpha | str | NOT_GIVEN | Inline speed alpha. |
repetition_penalty | float | NOT_GIVEN | Repetition penalty. |
temperature | float | NOT_GIVEN | Temperature for sampling. |
top_p | float | NOT_GIVEN | Top-p sampling parameter. |
RimeNonJsonTTSService Settings
Runtime-configurable settings passed via thesettings constructor argument using RimeNonJsonTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | None | Model identifier. (Inherited.) |
voice | str | None | Voice identifier. (Inherited.) |
language | Language | str | None | Language for synthesis. (Inherited.) |
segment | str | NOT_GIVEN | Segment type for synthesis. |
repetition_penalty | float | NOT_GIVEN | Repetition penalty. |
temperature | float | NOT_GIVEN | Temperature for sampling. |
top_p | float | NOT_GIVEN | Top-p sampling parameter. |
Usage
Basic Setup (WebSocket)
With Customization (WebSocket)
HTTP Service
Non-JSON WebSocket (Arcana)
Customizing Speech
RimeTTSService provides a set of helper methods for implementing Rime-specific customizations, meant to be used as part of text transformers. These include methods for spelling out text, adjusting speech rate, and modifying pitch. See the Text Transformers for TTS section in the Text-to-Speech guide for usage examples.
SPELL(text: str) -> str:
Implements Rime’s spell function to spell out text character by character.PAUSE_TAG(seconds: float) -> str:
Implements Rime’s custom pause functionality to generate a properly formatted pause tag you can insert into the text.PRONOUNCE(self, text: str, word: str, phoneme: str) -> str:
Convenience method to support Rime’s custom pronunciations feature. It takes a word and its desired phoneme representation, returning the text with the provided word replaced by the appropriate phoneme tag.INLINE_SPEED(self, text: str, speed: float) -> str:
A convenience method to support Rime’s inline speed adjustment feature. It will wrap the provided text in the[] tags and add the provided speed to the inlineSpeedAlpha field in the request metadata.
Notes
- Word-level timestamps:
RimeTTSServiceprovides word-level timing information, enabling synchronized text highlighting. - WebSocket vs HTTP: The WebSocket service supports word-level timestamps, interruption handling, and maintains context across messages within a turn. The HTTP service is simpler but lacks these features.
- Non-JSON WebSocket:
RimeNonJsonTTSServiceis for models like Arcana that use plain text messages instead of JSON. It does not support word-level timestamps.
Event Handlers
Rime WebSocket TTS services support the standard service connection events:| Event | Description |
|---|---|
on_connected | Connected to Rime WebSocket |
on_disconnected | Disconnected from Rime WebSocket |
on_connection_error | WebSocket connection error occurred |