← All guides

Claude Python SDK Tutorial: Complete Setup and Usage Guide (2026)

Step-by-step Claude Python SDK tutorial — install, authenticate, send messages, use streaming, tool use, and prompt caching.

Claude Python SDK Tutorial: Complete Setup and Usage Guide (2026)

The Claude Python SDK (anthropic) lets you call Claude's API in Python with a clean, typed interface — install with pip install anthropic, set your ANTHROPIC_API_KEY, and you're sending messages in under 10 lines of code. This tutorial covers everything from installation to advanced features: streaming, tool use, prompt caching, and multi-turn conversations. All code examples are tested against the current API.


Installation and Authentication

pip install anthropic

Set your API key as an environment variable:

export ANTHROPIC_API_KEY="sk-ant-..."

Or load it in Python via .env:

pip install python-dotenv
from dotenv import load_dotenv
import os

load_dotenv()
api_key = os.getenv("ANTHROPIC_API_KEY")

Your First API Call

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain prompt caching in one paragraph."}
    ]
)

print(message.content[0].text)

This is the minimal working example. The client reads ANTHROPIC_API_KEY from your environment automatically. No additional configuration needed.


Understanding the Response Object

print(message.id)           # msg_01XFDUDYJgAACzvnptvVoYEL
print(message.model)        # claude-sonnet-4-5
print(message.stop_reason) # end_turn
print(message.usage.input_tokens)   # 23
print(message.usage.output_tokens)  # 120
print(message.content[0].text)      # The actual response text

message.usage is critical for cost tracking. Input tokens × model price + output tokens × model price = your cost per call. See Claude API Cost and Prompt Caching Break-Even for the full pricing breakdown.


System Prompts

System prompts set the persona and behavior for the entire conversation:

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    system="You are a senior Python developer. Answer concisely with code examples.",
    messages=[
        {"role": "user", "content": "How do I handle rate limit errors?"}
    ]
)

Multi-Turn Conversations

Build conversation history by appending messages:

conversation = []

def chat(user_message):
    conversation.append({"role": "user", "content": user_message})
    
    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        messages=conversation
    )
    
    assistant_message = response.content[0].text
    conversation.append({"role": "assistant", "content": assistant_message})
    return assistant_message

# Multi-turn exchange
print(chat("What is the difference between a list and a tuple in Python?"))
print(chat("When would you choose a tuple over a list?"))
print(chat("Give me a code example with both."))

The conversation list acts as the full history. Claude has no persistent memory — you manage state client-side.


Streaming Responses

For long outputs or interactive UIs, use streaming to display text as it arrives:

with client.messages.stream(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a detailed explanation of async/await in Python."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Streaming benchmark: First token typically arrives in 300-500ms for Sonnet. Without streaming, you wait for the full response (can be 5-15 seconds for long outputs). For any user-facing application, streaming is strongly recommended.


Build production-ready Claude integrations

Agent SDK Cookbook ($49) includes 30+ Python recipes: streaming, tool use, multi-agent pipelines, error handling, and cost optimization patterns.

Get Agent SDK Cookbook — $49


Tool Use (Function Calling)

Tool use lets Claude call your Python functions when it needs to:

tools = [
    {
        "name": "get_stock_price",
        "description": "Get the current stock price for a given ticker symbol",
        "input_schema": {
            "type": "object",
            "properties": {
                "ticker": {
                    "type": "string",
                    "description": "Stock ticker symbol, e.g. AAPL"
                }
            },
            "required": ["ticker"]
        }
    }
]

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's the current price of Apple stock?"}]
)

# Check if Claude wants to call a tool
if response.stop_reason == "tool_use":
    tool_call = next(b for b in response.content if b.type == "tool_use")
    print(f"Claude wants to call: {tool_call.name}")
    print(f"With inputs: {tool_call.input}")
    
    # Call your actual function
    result = get_stock_price(tool_call.input["ticker"])  # your function
    
    # Return the result to Claude
    final_response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        tools=tools,
        messages=[
            {"role": "user", "content": "What's the current price of Apple stock?"},
            {"role": "assistant", "content": response.content},
            {
                "role": "user",
                "content": [{
                    "type": "tool_result",
                    "tool_use_id": tool_call.id,
                    "content": str(result)
                }]
            }
        ]
    )
    print(final_response.content[0].text)

For a deeper dive on tool use patterns, see Claude Agent SDK Guide.


Prompt Caching

Prompt caching dramatically reduces cost when you reuse large system prompts or document context:

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a code review assistant. [... large instructions ...]",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[{"role": "user", "content": "Review this function: [code]"}]
)

# Check cache status
print(response.usage.cache_creation_input_tokens)  # First call: tokens written to cache
print(response.usage.cache_read_input_tokens)       # Subsequent calls: tokens read from cache

Cached tokens cost 90% less than standard input tokens. For a system prompt over 2,000 tokens, caching pays for itself after 1 call.


Error Handling

import anthropic
from anthropic import APIError, RateLimitError, APIConnectionError

client = anthropic.Anthropic()

def call_claude_safely(prompt: str, retries: int = 3):
    for attempt in range(retries):
        try:
            return client.messages.create(
                model="claude-sonnet-4-5",
                max_tokens=1024,
                messages=[{"role": "user", "content": prompt}]
            )
        except RateLimitError:
            if attempt < retries - 1:
                import time
                time.sleep(2 ** attempt)  # Exponential backoff
            else:
                raise
        except APIConnectionError as e:
            print(f"Connection error: {e}")
            raise
        except APIError as e:
            print(f"API error {e.status_code}: {e.message}")
            raise

Async Usage

For async applications (FastAPI, async scripts):

import asyncio
import anthropic

async def main():
    client = anthropic.AsyncAnthropic()
    
    message = await client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello, Claude!"}]
    )
    
    print(message.content[0].text)

asyncio.run(main())

Use AsyncAnthropic for any async context. The interface is identical to the sync client, with await added.


Counting Tokens Before Calling

Estimate cost before sending a large prompt:

response = client.messages.count_tokens(
    model="claude-sonnet-4-5",
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "Your large prompt here..."}]
)

print(f"Input tokens: {response.input_tokens}")

This uses the same tokenizer as the actual API call, so the count is accurate.


Model Selection

# For most tasks — best balance of capability and cost
model = "claude-sonnet-4-5"

# For simple, high-volume tasks (10x cheaper than Sonnet)
model = "claude-haiku-4-5"

# For complex reasoning (3x more expensive than Sonnet)
model = "claude-opus-4-5"

See Claude Haiku vs Sonnet vs Opus: Which Model to Use for the full comparison with cost benchmarks.


Frequently Asked Questions

How do I install the Claude Python SDK?

Run pip install anthropic. Then set your API key: export ANTHROPIC_API_KEY="sk-ant-...". You can also pass the key directly: client = anthropic.Anthropic(api_key="sk-ant-..."), but environment variables are preferred for security.

What Python version does the Anthropic SDK support?

The anthropic package supports Python 3.7 and above. Python 3.9+ is recommended for full type hint support and best performance.

How do I handle long documents that exceed the context window?

For documents that exceed 200K tokens, split them into chunks and process sequentially, or use prompt caching with the cache_control parameter to cache a shared context. The 1M token context window is available on Claude 3.5 models — see the Claude 1M Context Window guide for details.

Is the Python SDK thread-safe?

The synchronous Anthropic client is thread-safe. You can share a single client instance across threads. For async applications, use AsyncAnthropic and await each call.

How do I track API costs in Python?

Use response.usage.input_tokens and response.usage.output_tokens from each response. Multiply by the per-token price for your model. Log these to a database or monitoring service. The Claude API Cost guide has current per-token prices.

What's the difference between stream=True and client.messages.stream()?

client.messages.stream() is the recommended streaming interface — it returns a context manager with a text_stream iterator. The lower-level stream=True parameter on create() returns raw Server-Sent Events. Use client.messages.stream() for most use cases.


30+ Python recipes for Claude API

Agent SDK Cookbook ($49) goes deep on production patterns: streaming pipelines, multi-agent coordination, tool use chains, and error recovery.

Get Agent SDK Cookbook — $49

Tools and references