Overview

Rime AI provides two TTS service implementations:
  • RimeTTSService: WebSocket-based with word-level timing and interruption support
  • RimeHttpTTSService: HTTP-based for simpler use cases
RimeTTSService is recommended for real-time interactive applications.

Installation

To use Rime services, install the required dependencies:
pip install "pipecat-ai[rime]"
You’ll also need to set up your Rime API key as an environment variable: RIME_API_KEY.
Get your API key by signing up at Rime.

Frames

Input

  • TextFrame - Text content to synthesize into speech
  • TTSSpeakFrame - Text that should be spoken immediately
  • TTSUpdateSettingsFrame - Runtime configuration updates
  • LLMFullResponseStartFrame / LLMFullResponseEndFrame - LLM response boundaries

Output

  • TTSStartedFrame - Signals start of synthesis
  • TTSAudioRawFrame - Generated audio data chunks (PCM format)
  • TTSTextFrame - Word-level timing information (WebSocket service only)
  • TTSStoppedFrame - Signals completion of synthesis
  • ErrorFrame - API or processing errors

Service Comparison

FeatureRimeTTSService (WebSocket)RimeHttpTTSService (HTTP)
Word Timestamps✅ Precise timing❌ Not available
Interruption✅ Context tracking⚠️ Basic support
Streaming✅ Real-time chunks✅ Chunked response
Inline Speed❌ Not supported✅ Word-level control
Arcana Model❌ Not supported✅ Latest model

Model Options

ModelDescriptionAvailability
mistv2Hyper-realistic conversational voices (recommended)Both services
mistPrevious generation modelBoth services
arcanaLatest high-quality modelHTTP only

Supported Sample Rates

WebSocket Service

Sample rates must be between 4000 Hz and 44100 Hz. Default: 24000 Hz.

HTTP Service

Sample rates must be between 8000 Hz and 96000 Hz. Default: 24000 Hz. Anything above 24000 Hz is up-sampling.

Language Support

Language CodeDescriptionService Code
Language.DEGermanger
Language.ENEnglisheng
Language.ESSpanishspa
Language.FRFrenchfra

Usage Example

Initialize the WebSocket service with your API key and desired voice:
from pipecat.services.rime.tts import RimeTTSService
from pipecat.transcriptions.language import Language
import os

# Configure WebSocket service
tts = RimeTTSService(
    api_key=os.getenv("RIME_API_KEY"),
    voice_id="rex",
    model="mistv2",
    params=RimeTTSService.InputParams(
        language=Language.EN,
        speed_alpha=1.0,
        reduce_latency=False,
        pause_between_brackets=True,
        phonemize_between_brackets=False
    )
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,  # Word timestamps enable precise context updates
    transport.output(),
    context_aggregator.assistant()
])

HTTP Service

Initialize the HTTP service and use it in a pipeline:
import aiohttp
from pipecat.services.rime.tts import RimeHttpTTSService

# Configure HTTP service for batch processing
async with aiohttp.ClientSession() as session:
    http_tts = RimeHttpTTSService(
        api_key=os.getenv("RIME_API_KEY"),
        voice_id="eva",
        aiohttp_session=session,
        model="arcana",  # Latest model
        params=RimeHttpTTSService.InputParams(
            language=Language.EN,
            speed_alpha=1.2,
            inline_speed_alpha="0.8,1.5",  # Word-level speed control
            pause_between_brackets=True
        )
    )

Dynamic Configuration

Make settings updates by pushing a TTSUpdateSettingsFrame:
from pipecat.frames.frames import TTSUpdateSettingsFrame

await task.queue_frame(TTSUpdateSettingsFrame(
    voice_id="your-voice-id",
  )
)

Metrics

Both services provide comprehensive metrics:
  • Time to First Byte (TTFB) - Latency from text input to first audio
  • Processing Duration - Total synthesis time
  • Character Usage - Text processed for billing
Learn how to enable Metrics in your Pipeline.

Additional Notes

  • WebSocket Recommended: Use RimeTTSService for interactive applications requiring word timestamps and precise context management
  • Context Tracking: WebSocket service maintains context across multiple messages within a turn
  • Text Aggregation: WebSocket service uses SkipTagsAggregator by default to handle Rime’s spell() tags
  • Model Selection: Use mistv2 for best balance of quality and performance, arcana for highest quality (HTTP only)
  • Advanced Controls: HTTP service supports more text markup features like inline speed control