Overview

SimliVideoService creates AI avatar video responses by converting audio input into synchronized video and audio output through Simli’s WebRTC platform. It handles real-time audio streaming, video generation, and automatic audio resampling.

Installation

Install the required dependencies:

pip install pipecat-ai[simli]

Configuration

Required Environment Variables

SIMLI_API_KEY=your_api_key
SIMLI_FACE_ID=your_face_id

Get your API key and Face ID by signing up at Simli

Configuration

SimliVideoService(
    SimliConfig(SIMLI_API_KEY, SIMLI_FACE_ID), useTurnServer=False, latencyInterval=60
)

Constructor Parameters for SimliConfig

apiKey
str
required

Your Simli API key. This key is required for authenticating API requests.

faceId
str
required

The face identifier for Simli. This is used to associate API interactions with a specific face or persona.

syncAudio
bool
default: "True"

Indicates whether to synchronize audio streams. Defaults to True. Set to False to disable audio synchronization.

handleSilence
bool
default: "True"

Determines if silence in audio streams should be handled automatically. Defaults to True.

maxSessionLength
int
default: "600"

The maximum length of a session in seconds. Defaults to 600 (10 minutes).

maxIdleTime
int
default: "30"

The maximum idle time (in seconds) allowed during a session before it is automatically terminated. Defaults to 30 seconds.

Constructor Parameters for SimliVideoService

simli_config
SimliConfig
required

The configuration object for Simli. This must be an instance of simli_config and contains essential settings such as API key, face ID, and other session-related configurations.

use_turn_server
bool
default: "False"

Determines whether a TURN server should be used for establishing connections. Defaults to False. Set to True if your network requires TURN for WebRTC connections.

latency_interval
int
default: "0"

Delay (in seconds) between ping calls to calculate latency between your machine and simli server. Set to 0 to disable.

Input Frames

Audio Input

TTSAudioRawFrame
Frame

Raw audio data for avatar speech

Control Frames

TTSStartedFrame
Frame

Signals start of speech synthesis

TTSStoppedFrame
Frame

Signals end of speech synthesis

StartInterruptionFrame
Frame

Signals conversation interruption

EndFrame
Frame

Signals end of conversation

CancelFrame
Frame

Signals conversation cancellation

Usage Example

from pipecat.pipeline.pipeline import Pipeline
from pipecat.services.simli import SimliVideoService
from simli import SimliConfig
import os

async def create_avatar_pipeline():
    # Initialize Simli service
    simli = SimliVideoService(
        SimliConfig(
            api_key=os.getenv("SIMLI_API_KEY"),
            face_id=os.getenv("SIMLI_FACE_ID")
        )
    )

    # Create pipeline with Simli
    pipeline = Pipeline([
        transport.input(),    # Your audio input
        llm,                  # Language model service
        tts_service,          # Text-to-speech service
        simli,                # Simli video generation
        transport.output(),   # Your video output handler
    ])

    return pipeline

Frame Flow

Metrics Support

The service collects processing metrics:

  • Processing duration
  • Time to First Byte (TTFB)
  • API response times
  • Audio processing metrics

Common Use Cases

  1. AI Video Avatars

    • Virtual assistants
    • Customer service
    • Educational content
  2. Interactive Presentations

    • Product demonstrations
    • Training materials
    • Marketing content
  3. Real-time Communication

    • Video conferencing
    • Virtual meetings
    • Interactive broadcasts

Notes

  • Handles real-time audio streaming
  • Supports conversation interruptions
  • Manages conversation lifecycle
  • Automatic audio resampling
  • Thread-safe processing
  • WebRTC integration through Daily
  • Includes comprehensive error handling