Overview

CerebrasLLMService provides access to Cerebras’s language models through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management.

Installation

To use CerebrasLLMService, install the required dependencies:

pip install "pipecat-ai[cerebras]"

You’ll need to set up your Cerebras API key as an environment variable: CEREBRAS_API_KEY

Configuration

Constructor Parameters

api_key
str
required

Your Cerebras API key

model
str
default: "llama-3.3-70b"

Model identifier

base_url
str
default: "https://api.cerebras.ai/v1"

Cerebras API endpoint

Input Parameters

Inherits OpenAI-compatible parameters:

max-completion-tokens
Optional[int]

Maximum number of tokens to generate. Must be greater than or equal to 1

seed
Optional[int]

Random seed for deterministic generation. Must be greater than or equal to 0

temperature
Optional[float]

Controls randomness in the output. Range: [0.0, 1.5]

top_p
Optional[float]

Controls diversity via nucleus sampling. Range: [0.0, 1.0]

Usage Example

from pipecat.services.cerebras import CerebrasLLMService
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from openai.types.chat import ChatCompletionToolParam
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineParams, PipelineTask

# Configure service
llm = CerebrasLLMService(
    api_key="your-cerebras-api-key",
    model="llama-3.3-70b"
)

# Define tools for function calling
tools = [
    ChatCompletionToolParam(
        type="function",
        function={
            "name": "get_current_weather",
            "description": "Get the current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "format": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The temperature unit to use"
                    }
                },
                "required": ["location", "format"]
            }
        }
    )
]

# Create context with system message and tools
context = OpenAILLMContext(
    messages = [
        {
            "role": "system",
            "content": """You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way.

You have one functions available:

1. get_current_weather is used to get current weather information.

Infer whether to use Fahrenheit or Celsius automatically based on the location, unless the user specifies a preference. Start by asking me for my location. Then, use 'get_weather_current' to give me a forecast. Respond to what the user said in a creative and helpful way.""",
        },
    ]
    tools=tools
)

# Register function handlers
async def fetch_weather(function_name, tool_call_id, args, llm, context, result_callback):
    await result_callback({"conditions": "nice", "temperature": "75"})

llm.register_function(None, fetch_weather)

# Create context aggregator for message handling
context_aggregator = llm.create_context_aggregator(context)

# Set up pipeline
pipeline = Pipeline([
    transport.input(),
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

# Create and configure task
task = PipelineTask(
    pipeline,
    PipelineParams(
        allow_interruptions=True,
        enable_metrics=True,
        enable_usage_metrics=True,
    ),
)

Methods

See the LLM base class methods for additional functionality.

Function Calling

Supports OpenAI-compatible function calling. For optimal function calling performance, provide clear instructions in the system message about when and how to use functions.

Available Models

Cerebras provides access to these models:

Model NameDescription
llama3.1-8bLlama 3.1 8B model
llama3.1-70bLlama 3.1 70B model
llama-3.3-70bLlama 3.3 70B model

Frame Flow

Inherits the OpenAI LLM Service frame flow:

Metrics Support

The service collects standard LLM metrics:

  • Token usage (prompt and completion)
  • Processing duration
  • Time to First Byte (TTFB)
  • Function call metrics

Notes

  • OpenAI-compatible interface
  • Supports streaming responses
  • Handles function calling
  • Manages conversation context
  • Thread-safe processing
  • Automatic error handling