Claude API Python SDK: Complete Quickstart (2026)

Q: Do I need to install anything other than the `anthropic` package to get started?

No. pip install anthropic is the only dependency. The SDK handles authentication, retries, and error handling out of the box. You just need a valid API key from console.anthropic.com and an ANTHROPIC_API_KEY environment variable.

Q: What is the difference between `client.messages.create()` and `client.messages.stream()`?

messages.create() waits for the full response before returning — best for programmatic tasks where you process the result as a string. messages.stream() yields tokens as they arrive, which is better for chat interfaces or any UX where you want to show text appearing progressively.

Q: Which model should I use by default when starting a new project?

Start with claude-3-5-haiku-20241022 for anything repetitive or high-volume to keep costs low. Use claude-3-5-sonnet-20241022 as your default for tasks requiring better reasoning or instruction following. Only reach for Opus or extended thinking when Sonnet demonstrably falls short on your specific task.

The Anthropic Python SDK is the fastest way to get Claude running in your code. This guide covers everything from installation to the patterns you'll use in every real project.

Install

pip install anthropic

Or with uv (faster):

uv add anthropic

Authentication

Get your API key from console.anthropic.com → API Keys → Create Key.

The SDK reads the key from the ANTHROPIC_API_KEY environment variable automatically:

export ANTHROPIC_API_KEY="sk-ant-..."

Or pass it explicitly:

import anthropic

client = anthropic.Anthropic(api_key="sk-ant-...")

For production, use environment variables. Never hardcode keys.

Your first message

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(message.content[0].text)
# Paris

The response is a Message object. The text lives at message.content[0].text.

The response object

Understanding the full response structure:

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)

print(message.id)              # msg_01XFDUDYJgAACzvnptvVoYEL
print(message.model)           # claude-3-5-sonnet-20241022
print(message.stop_reason)     # end_turn
print(message.usage.input_tokens)   # 10
print(message.usage.output_tokens)  # 25
print(message.content[0].type)      # text
print(message.content[0].text)      # Hello! How can I help you today?

System prompts

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system="You are a helpful assistant that responds only in haiku.",
    messages=[
        {"role": "user", "content": "What is Python?"}
    ]
)

The system parameter takes a string. It sets Claude's persona, task context, and constraints for the entire conversation.

Multi-turn conversations

messages = []

# Turn 1
messages.append({"role": "user", "content": "My name is Alex."})
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=512,
    messages=messages
)
messages.append({"role": "assistant", "content": response.content[0].text})

# Turn 2
messages.append({"role": "user", "content": "What's my name?"})
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=512,
    messages=messages
)
print(response.content[0].text)
# Your name is Alex.

The API is stateless — you send the full conversation history each time. Build a list, append user and assistant turns, send the list.

Streaming

For long responses, stream tokens as they arrive:

with client.messages.stream(
    model="claude-3-5-sonnet-20241022",
    max_tokens=2048,
    messages=[{"role": "user", "content": "Write a short story about a robot."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

# Get the full message after streaming completes
final = stream.get_final_message()
print(f"\nTokens used: {final.usage.input_tokens + final.usage.output_tokens}")

Async support

For async applications (FastAPI, async scripts):

import asyncio
import anthropic

async def main():
    client = anthropic.AsyncAnthropic()
    
    message = await client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello"}]
    )
    print(message.content[0].text)

asyncio.run(main())

For async streaming:

async def stream_response():
    async with client.messages.stream(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Count to 10"}]
    ) as stream:
        async for text in stream.text_stream:
            print(text, end="", flush=True)

Choosing the right model

Model	Use case	Cost
`claude-3-5-haiku-20241022`	High-volume, speed-sensitive tasks	Cheapest
`claude-3-5-sonnet-20241022`	Most tasks — best balance	Mid
`claude-3-7-sonnet-20250219`	Extended thinking, complex reasoning	Higher
`claude-opus-4`	Hardest tasks, highest quality	Most expensive

Start with Haiku for anything repetitive. Use Sonnet as default. Only reach for Opus or extended thinking when Sonnet falls short.

Error handling

import anthropic

client = anthropic.Anthropic()

try:
    message = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello"}]
    )
except anthropic.APIConnectionError:
    print("Network error — check your connection")
except anthropic.RateLimitError:
    print("Rate limit hit — back off and retry")
except anthropic.APIStatusError as e:
    print(f"API error {e.status_code}: {e.message}")

The SDK has built-in retry logic for transient errors (429s, 529s). By default it retries 2 times with exponential backoff:

# Configure retries
client = anthropic.Anthropic(
    max_retries=3,          # default is 2
    timeout=60.0,           # seconds
)

Tracking token usage and cost

message = client.messages.create(
    model="claude-3-5-haiku-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Summarize this article..."}]
)

input_tokens = message.usage.input_tokens
output_tokens = message.usage.output_tokens

# Haiku pricing (April 2026)
HAIKU_INPUT = 0.80 / 1_000_000   # $0.80 per 1M input tokens
HAIKU_OUTPUT = 4.00 / 1_000_000  # $4.00 per 1M output tokens

cost = (input_tokens * HAIKU_INPUT) + (output_tokens * HAIKU_OUTPUT)
print(f"This call cost: ${cost:.6f}")

Prompt caching (50-90% cost reduction)

For repeated system prompts or document analysis, add cache_control to cache the expensive prefix:

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": very_long_document,  # Only charged on cache miss
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[{"role": "user", "content": "Summarize the key points."}]
)

First call writes the cache. Subsequent calls with the same prefix get a 90% discount on those tokens. See the prompt caching cost analysis for the break-even math.

A minimal production wrapper

import anthropic
from typing import Optional

client = anthropic.Anthropic()

def ask_claude(
    prompt: str,
    system: Optional[str] = None,
    model: str = "claude-3-5-haiku-20241022",
    max_tokens: int = 1024,
) -> str:
    """Simple wrapper for one-shot Claude calls."""
    kwargs = {
        "model": model,
        "max_tokens": max_tokens,
        "messages": [{"role": "user", "content": prompt}],
    }
    if system:
        kwargs["system"] = system
    
    message = client.messages.create(**kwargs)
    return message.content[0].text

# Use it
result = ask_claude(
    "Extract the sentiment from this review: 'Great product, fast shipping!'",
    system="Return only: positive, negative, or neutral."
)
print(result)  # positive

Next steps

Claude API pricing and cost optimization — understand the cost levers
Prompt caching break-even analysis — when caching pays off
Streaming guide — send tokens to the browser as they arrive
Batch API for async workloads — 50% discount for non-real-time jobs
Error handling in production — retry strategies and error taxonomy

The Claude API Cost Optimization Masterclass covers the full stack: prompt caching, model tiering, batch API, and the exact implementation order to maximize savings.

Drafted with Claude Code. All code examples verified against anthropic Python SDK v0.40+. Pricing as of April 2026.

Frequently Asked Questions

Do I need to install anything other than the `anthropic` package to get started?

No. pip install anthropic is the only dependency. The SDK handles authentication, retries, and error handling out of the box. You just need a valid API key from console.anthropic.com and an ANTHROPIC_API_KEY environment variable.

What is the difference between `client.messages.create()` and `client.messages.stream()`?

messages.create() waits for the full response before returning — best for programmatic tasks where you process the result as a string. messages.stream() yields tokens as they arrive, which is better for chat interfaces or any UX where you want to show text appearing progressively.

How do I handle multi-turn conversations with the Python SDK?

The Claude API is stateless — you must send the full conversation history on every call. Build a list of {"role": "user", "content": "..."} and {"role": "assistant", "content": "..."} dicts, append each new turn, and pass the full list as the messages parameter each time.

Which model should I use by default when starting a new project?

Start with claude-3-5-haiku-20241022 for anything repetitive or high-volume to keep costs low. Use claude-3-5-sonnet-20241022 as your default for tasks requiring better reasoning or instruction following. Only reach for Opus or extended thinking when Sonnet demonstrably falls short on your specific task.

Does the SDK automatically retry on rate limit errors?

Yes. The official Anthropic Python SDK retries on 429 (rate limit) and 529 (overloaded) responses using exponential backoff. The default is 2 retries; you can increase this by passing max_retries=N when creating the client. For production systems, 3–4 retries is a reasonable setting.

Take It Further

Claude API Cost Optimization Masterclass — The practical guide to cutting Claude API costs by 60–90% in production. Model tiering, prompt caching, Batch API, and token compression — with real numbers from 12 production deployments.

120-page PDF + Excel cost calculator.

→ Get Cost Optimization Masterclass — $59

30-day money-back guarantee. Instant download.