Overview
AnthropicLLMService provides integration with Anthropic’s Claude models, supporting streaming responses, function calling, and prompt caching with specialized context handling for Anthropic’s message format and advanced reasoning capabilities.
Anthropic LLM API Reference
Pipecat’s API methods for Anthropic Claude integration
Example Implementation
Complete example with function calling
Anthropic Documentation
Official Anthropic API documentation and features
Anthropic Console
Access Claude models and API keys
Installation
To use Anthropic services, install the required dependency:Prerequisites
Anthropic Account Setup
Before using Anthropic LLM services, you need:- Anthropic Account: Sign up at Anthropic Console
- API Key: Generate an API key from your console dashboard
- Model Selection: Choose from available Claude models (Claude Sonnet 4.5, Claude Opus 4.6, etc.)
Required Environment Variables
ANTHROPIC_API_KEY: Your Anthropic API key for authentication
Configuration
Anthropic API key for authentication.
Claude model name to use (e.g.,
"claude-sonnet-4-5-20250929", "claude-opus-4-6-20250929").Runtime-configurable model settings. See InputParams below.
Optional custom Anthropic client instance. Useful for custom clients like
AsyncAnthropicBedrock or AsyncAnthropicVertex.Request timeout in seconds. Used when
retry_on_timeout is enabled to determine when to retry.Whether to retry the request once if it times out. The retry attempt has no timeout limit.
InputParams
Model inference settings that can be set at initialization via theparams constructor argument, or changed at runtime via UpdateSettingsFrame.
| Parameter | Type | Default | Description |
|---|---|---|---|
enable_prompt_caching | bool | None | Whether to enable Anthropic’s prompt caching feature. Reduces costs for repeated context. |
max_tokens | int | 4096 | Maximum tokens to generate. Must be at least 1. |
temperature | float | NOT_GIVEN | Sampling temperature (0.0 to 1.0). Lower values are more focused, higher values are more creative. |
top_k | int | NOT_GIVEN | Top-k sampling parameter. Limits tokens to the top k most likely. |
top_p | float | NOT_GIVEN | Top-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output. |
thinking | ThinkingConfig | NOT_GIVEN | Extended thinking configuration. See ThinkingConfig below. |
extra | dict | {} | Additional parameters passed directly to the API. |
NOT_GIVEN values are omitted from the API request entirely, letting the Anthropic API use its own defaults.ThinkingConfig
Configuration for Anthropic’s extended thinking feature, which causes the model to spend more time reasoning before responding.| Parameter | Type | Description |
|---|---|---|
type | "enabled" or "disabled" | Whether extended thinking is enabled. |
budget_tokens | int | Maximum number of tokens for thinking. Minimum 1024 with current models. |
LLMThoughtStartFrame, LLMThoughtTextFrame, and LLMThoughtEndFrame during response generation.
Usage
Basic Setup
With Custom Parameters
With Extended Thinking
Updating Settings at Runtime
Model settings can be changed mid-conversation usingUpdateSettingsFrame:
Notes
- Prompt caching: When
enable_prompt_cachingis enabled, Anthropic caches repeated context to reduce costs. Cache control markers are automatically added to the most recent user messages. This is most effective for conversations with large system prompts or long conversation histories. - Extended thinking: Enabling thinking increases response quality for complex tasks but adds latency. The
budget_tokensvalue must be at least 1024 with current models. Extended thinking is disabled by default. - Custom clients: You can pass custom Anthropic client instances (e.g.,
AsyncAnthropicBedrockorAsyncAnthropicVertex) via theclientparameter to use Anthropic models through other cloud providers. - Retry behavior: When
retry_on_timeout=True, the first attempt uses theretry_timeout_secstimeout. If it times out, a second attempt is made with no timeout limit.
Event Handlers
AnthropicLLMService supports the following event handlers, inherited from LLMService:
| Event | Description |
|---|---|
on_completion_timeout | Called when an LLM completion request times out |
on_function_calls_started | Called when function calls are received and execution is about to start |