Skip to main content

Overview

AnthropicLLMService provides integration with Anthropic’s Claude models, supporting streaming responses, function calling, and prompt caching with specialized context handling for Anthropic’s message format and advanced reasoning capabilities.

Installation

To use Anthropic services, install the required dependency:
pip install "pipecat-ai[anthropic]"

Prerequisites

Anthropic Account Setup

Before using Anthropic LLM services, you need:
  1. Anthropic Account: Sign up at Anthropic Console
  2. API Key: Generate an API key from your console dashboard
  3. Model Selection: Choose from available Claude models (Claude Sonnet 4.5, Claude Opus 4.6, etc.)

Required Environment Variables

  • ANTHROPIC_API_KEY: Your Anthropic API key for authentication

Configuration

api_key
str
required
Anthropic API key for authentication.
model
str
default:"claude-sonnet-4-5-20250929"
Claude model name to use (e.g., "claude-sonnet-4-5-20250929", "claude-opus-4-6-20250929").
params
InputParams
default:"None"
Runtime-configurable model settings. See InputParams below.
client
AsyncAnthropic
default:"None"
Optional custom Anthropic client instance. Useful for custom clients like AsyncAnthropicBedrock or AsyncAnthropicVertex.
retry_timeout_secs
float
default:"5.0"
Request timeout in seconds. Used when retry_on_timeout is enabled to determine when to retry.
retry_on_timeout
bool
default:"False"
Whether to retry the request once if it times out. The retry attempt has no timeout limit.

InputParams

Model inference settings that can be set at initialization via the params constructor argument, or changed at runtime via UpdateSettingsFrame.
ParameterTypeDefaultDescription
enable_prompt_cachingboolNoneWhether to enable Anthropic’s prompt caching feature. Reduces costs for repeated context.
max_tokensint4096Maximum tokens to generate. Must be at least 1.
temperaturefloatNOT_GIVENSampling temperature (0.0 to 1.0). Lower values are more focused, higher values are more creative.
top_kintNOT_GIVENTop-k sampling parameter. Limits tokens to the top k most likely.
top_pfloatNOT_GIVENTop-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output.
thinkingThinkingConfigNOT_GIVENExtended thinking configuration. See ThinkingConfig below.
extradict{}Additional parameters passed directly to the API.
NOT_GIVEN values are omitted from the API request entirely, letting the Anthropic API use its own defaults.

ThinkingConfig

Configuration for Anthropic’s extended thinking feature, which causes the model to spend more time reasoning before responding.
ParameterTypeDescription
type"enabled" or "disabled"Whether extended thinking is enabled.
budget_tokensintMaximum number of tokens for thinking. Minimum 1024 with current models.
When extended thinking is enabled, the service emits LLMThoughtStartFrame, LLMThoughtTextFrame, and LLMThoughtEndFrame during response generation.

Usage

Basic Setup

from pipecat.services.anthropic import AnthropicLLMService

llm = AnthropicLLMService(
    api_key=os.getenv("ANTHROPIC_API_KEY"),
    model="claude-sonnet-4-5-20250929",
)

With Custom Parameters

from pipecat.services.anthropic import AnthropicLLMService

llm = AnthropicLLMService(
    api_key=os.getenv("ANTHROPIC_API_KEY"),
    model="claude-sonnet-4-5-20250929",
    params=AnthropicLLMService.InputParams(
        enable_prompt_caching=True,
        max_tokens=2048,
        temperature=0.7,
    ),
)

With Extended Thinking

llm = AnthropicLLMService(
    api_key=os.getenv("ANTHROPIC_API_KEY"),
    model="claude-sonnet-4-5-20250929",
    params=AnthropicLLMService.InputParams(
        max_tokens=16384,
        thinking=AnthropicLLMService.ThinkingConfig(
            type="enabled",
            budget_tokens=10000,
        ),
    ),
)

Updating Settings at Runtime

Model settings can be changed mid-conversation using UpdateSettingsFrame:
from pipecat.frames.frames import UpdateSettingsFrame

await task.queue_frame(
    UpdateSettingsFrame(
        settings={
            "llm": {
                "temperature": 0.3,
                "max_tokens": 1024,
            }
        }
    )
)

Notes

  • Prompt caching: When enable_prompt_caching is enabled, Anthropic caches repeated context to reduce costs. Cache control markers are automatically added to the most recent user messages. This is most effective for conversations with large system prompts or long conversation histories.
  • Extended thinking: Enabling thinking increases response quality for complex tasks but adds latency. The budget_tokens value must be at least 1024 with current models. Extended thinking is disabled by default.
  • Custom clients: You can pass custom Anthropic client instances (e.g., AsyncAnthropicBedrock or AsyncAnthropicVertex) via the client parameter to use Anthropic models through other cloud providers.
  • Retry behavior: When retry_on_timeout=True, the first attempt uses the retry_timeout_secs timeout. If it times out, a second attempt is made with no timeout limit.

Event Handlers

AnthropicLLMService supports the following event handlers, inherited from LLMService:
EventDescription
on_completion_timeoutCalled when an LLM completion request times out
on_function_calls_startedCalled when function calls are received and execution is about to start
@llm.event_handler("on_completion_timeout")
async def on_completion_timeout(service):
    print("LLM completion timed out")