Skip to main content

Overview

AnthropicLLMService provides integration with Anthropic’s Claude models, supporting streaming responses, function calling, and prompt caching with specialized context handling for Anthropic’s message format and advanced reasoning capabilities.

Installation

To use Anthropic services, install the required dependency:
pip install "pipecat-ai[anthropic]"

Prerequisites

Anthropic Account Setup

Before using Anthropic LLM services, you need:
  1. Anthropic Account: Sign up at Anthropic Console
  2. API Key: Generate an API key from your console dashboard
  3. Model Selection: Choose from available Claude models (Claude Sonnet 4.5, Claude Opus 4.6, etc.)

Required Environment Variables

  • ANTHROPIC_API_KEY: Your Anthropic API key for authentication

Configuration

api_key
str
required
Anthropic API key for authentication.
model
str
default:"None"
deprecated
Claude model name to use (e.g., "claude-sonnet-4-5-20250929", "claude-opus-4-6-20250929"). Deprecated in v0.0.105. Use settings=AnthropicLLMService.Settings(...) instead.
settings
AnthropicLLMService.Settings
default:"None"
Runtime-configurable model settings. See Settings below.
params
InputParams
default:"None"
deprecated
Runtime-configurable model settings. See Settings below. Deprecated in v0.0.105. Use settings=AnthropicLLMService.Settings(...) instead.
client
AsyncAnthropic
default:"None"
Optional custom Anthropic client instance. Useful for custom clients like AsyncAnthropicBedrock or AsyncAnthropicVertex.
retry_timeout_secs
float
default:"5.0"
Request timeout in seconds. Used when retry_on_timeout is enabled to determine when to retry.
retry_on_timeout
bool
default:"False"
Whether to retry the request once if it times out. The retry attempt has no timeout limit.

Settings

Runtime-configurable settings passed via the settings constructor argument using AnthropicLLMService.Settings(...). These can be updated mid-conversation with LLMUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneAnthropic model identifier. (Inherited from base settings.)
system_instructionstrNoneSystem instruction/prompt for the model. (Inherited from base settings.)
max_tokensintNOT_GIVENMaximum tokens to generate.
temperaturefloatNOT_GIVENSampling temperature (0.0 to 1.0). Lower values are more focused, higher values more creative.
top_kintNOT_GIVENTop-k sampling parameter. Limits tokens to the top k most likely.
top_pfloatNOT_GIVENTop-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output.
enable_prompt_cachingboolNOT_GIVENWhether to enable Anthropic’s prompt caching feature. Reduces costs for repeated context.
thinkingAnthropicThinkingConfigNOT_GIVENExtended thinking configuration. See AnthropicThinkingConfig below.
NOT_GIVEN values are omitted from the API request entirely, letting the Anthropic API use its own defaults.

AnthropicThinkingConfig

Configuration for Anthropic’s extended thinking feature, which causes the model to spend more time reasoning before responding.
ParameterTypeDefaultDescription
type"enabled" or "disabled"Whether extended thinking is enabled.
budget_tokensint (optional)NoneMaximum number of tokens for thinking. Currently required when type is “enabled”, minimum 1024 with today’s models. Not allowed when “disabled”.
When extended thinking is enabled, the service emits LLMThoughtStartFrame, LLMThoughtTextFrame, and LLMThoughtEndFrame during response generation.

Usage

Basic Setup

from pipecat.services.anthropic import AnthropicLLMService

llm = AnthropicLLMService(
    api_key=os.getenv("ANTHROPIC_API_KEY"),
    model="claude-sonnet-4-5-20250929",
)

With Custom Settings

from pipecat.services.anthropic import AnthropicLLMService

llm = AnthropicLLMService(
    api_key=os.getenv("ANTHROPIC_API_KEY"),
    settings=AnthropicLLMService.Settings(
        model="claude-sonnet-4-5-20250929",
        enable_prompt_caching=True,
        max_tokens=2048,
        temperature=0.7,
    ),
)

With Extended Thinking

llm = AnthropicLLMService(
    api_key=os.getenv("ANTHROPIC_API_KEY"),
    settings=AnthropicLLMService.Settings(
        model="claude-sonnet-4-5-20250929",
        max_tokens=16384,
        thinking=AnthropicLLMService.AnthropicThinkingConfig(
            type="enabled",
            budget_tokens=10000,
        ),
    ),
)

Updating Settings at Runtime

Model settings can be changed mid-conversation using LLMUpdateSettingsFrame:
from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.anthropic.llm import AnthropicLLMSettings

await task.queue_frame(
    LLMUpdateSettingsFrame(
        delta=AnthropicLLMSettings(
            temperature=0.3,
            max_tokens=1024,
        )
    )
)
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Notes

  • Prompt caching: When enable_prompt_caching is enabled, Anthropic caches repeated context to reduce costs. Cache control markers are automatically added to the most recent user messages. This is most effective for conversations with large system prompts or long conversation histories.
  • Extended thinking: Enabling thinking increases response quality for complex tasks but adds latency. When type="enabled", you must provide a budget_tokens value (minimum 1024 with current models). Extended thinking is disabled by default.
  • Custom clients: You can pass custom Anthropic client instances (e.g., AsyncAnthropicBedrock or AsyncAnthropicVertex) via the client parameter to use Anthropic models through other cloud providers.
  • Retry behavior: When retry_on_timeout=True, the first attempt uses the retry_timeout_secs timeout. If it times out, a second attempt is made with no timeout limit.
  • System instruction precedence: If both system_instruction (from the constructor) and a system message in the context are set, the constructor’s system_instruction takes precedence and a warning is logged.

Event Handlers

AnthropicLLMService supports the following event handlers, inherited from LLMService:
EventDescription
on_completion_timeoutCalled when an LLM completion request times out
on_function_calls_startedCalled when function calls are received and execution is about to start
@llm.event_handler("on_completion_timeout")
async def on_completion_timeout(service):
    print("LLM completion timed out")