Cerebras
LLM service implementation using Cerebras’s API with OpenAI-compatible interface
Overview
CerebrasLLMService
provides access to Cerebras’s language models through an OpenAI-compatible interface. It inherits from OpenAILLMService
and supports streaming responses, function calling, and context management.
Installation
To use CerebrasLLMService
, install the required dependencies:
You’ll need to set up your Cerebras API key as an environment variable: CEREBRAS_API_KEY
Configuration
Constructor Parameters
Your Cerebras API key
Model identifier
Cerebras API endpoint
Input Parameters
Inherits OpenAI-compatible parameters:
Maximum number of tokens to generate. Must be greater than or equal to 1
Random seed for deterministic generation. Must be greater than or equal to 0
Controls randomness in the output. Range: [0.0, 1.5]
Controls diversity via nucleus sampling. Range: [0.0, 1.0]
Usage Example
Methods
See the LLM base class methods for additional functionality.
Function Calling
Supports OpenAI-compatible function calling. For optimal function calling performance, provide clear instructions in the system message about when and how to use functions.
Available Models
Cerebras provides access to these models:
Model Name | Description |
---|---|
llama3.1-8b | Llama 3.1 8B model |
llama3.1-70b | Llama 3.1 70B model |
llama-3.3-70b | Llama 3.3 70B model |
Frame Flow
Inherits the OpenAI LLM Service frame flow:
Metrics Support
The service collects standard LLM metrics:
- Token usage (prompt and completion)
- Processing duration
- Time to First Byte (TTFB)
- Function call metrics
Notes
- OpenAI-compatible interface
- Supports streaming responses
- Handles function calling
- Manages conversation context
- Thread-safe processing
- Automatic error handling