Claude API Python SDK: Complete Quickstart (2026)
The Anthropic Python SDK is the fastest way to get Claude running in your code. This guide covers everything from installation to the patterns you'll use in every real project.
Install
pip install anthropic
Or with uv (faster):
uv add anthropic
Authentication
Get your API key from console.anthropic.com → API Keys → Create Key.
The SDK reads the key from the ANTHROPIC_API_KEY environment variable automatically:
export ANTHROPIC_API_KEY="sk-ant-..."
Or pass it explicitly:
import anthropic
client = anthropic.Anthropic(api_key="sk-ant-...")
For production, use environment variables. Never hardcode keys.
Your first message
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
print(message.content[0].text)
# Paris
The response is a Message object. The text lives at message.content[0].text.
The response object
Understanding the full response structure:
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
print(message.id) # msg_01XFDUDYJgAACzvnptvVoYEL
print(message.model) # claude-3-5-sonnet-20241022
print(message.stop_reason) # end_turn
print(message.usage.input_tokens) # 10
print(message.usage.output_tokens) # 25
print(message.content[0].type) # text
print(message.content[0].text) # Hello! How can I help you today?
System prompts
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system="You are a helpful assistant that responds only in haiku.",
messages=[
{"role": "user", "content": "What is Python?"}
]
)
The system parameter takes a string. It sets Claude's persona, task context, and constraints for the entire conversation.
Multi-turn conversations
messages = []
# Turn 1
messages.append({"role": "user", "content": "My name is Alex."})
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=512,
messages=messages
)
messages.append({"role": "assistant", "content": response.content[0].text})
# Turn 2
messages.append({"role": "user", "content": "What's my name?"})
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=512,
messages=messages
)
print(response.content[0].text)
# Your name is Alex.
The API is stateless — you send the full conversation history each time. Build a list, append user and assistant turns, send the list.
Streaming
For long responses, stream tokens as they arrive:
with client.messages.stream(
model="claude-3-5-sonnet-20241022",
max_tokens=2048,
messages=[{"role": "user", "content": "Write a short story about a robot."}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
# Get the full message after streaming completes
final = stream.get_final_message()
print(f"\nTokens used: {final.usage.input_tokens + final.usage.output_tokens}")
Async support
For async applications (FastAPI, async scripts):
import asyncio
import anthropic
async def main():
client = anthropic.AsyncAnthropic()
message = await client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
print(message.content[0].text)
asyncio.run(main())
For async streaming:
async def stream_response():
async with client.messages.stream(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": "Count to 10"}]
) as stream:
async for text in stream.text_stream:
print(text, end="", flush=True)
Choosing the right model
| Model | Use case | Cost |
|---|---|---|
claude-3-5-haiku-20241022 |
High-volume, speed-sensitive tasks | Cheapest |
claude-3-5-sonnet-20241022 |
Most tasks — best balance | Mid |
claude-3-7-sonnet-20250219 |
Extended thinking, complex reasoning | Higher |
claude-opus-4 |
Hardest tasks, highest quality | Most expensive |
Start with Haiku for anything repetitive. Use Sonnet as default. Only reach for Opus or extended thinking when Sonnet falls short.
Error handling
import anthropic
client = anthropic.Anthropic()
try:
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
except anthropic.APIConnectionError:
print("Network error — check your connection")
except anthropic.RateLimitError:
print("Rate limit hit — back off and retry")
except anthropic.APIStatusError as e:
print(f"API error {e.status_code}: {e.message}")
The SDK has built-in retry logic for transient errors (429s, 529s). By default it retries 2 times with exponential backoff:
# Configure retries
client = anthropic.Anthropic(
max_retries=3, # default is 2
timeout=60.0, # seconds
)
Tracking token usage and cost
message = client.messages.create(
model="claude-3-5-haiku-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": "Summarize this article..."}]
)
input_tokens = message.usage.input_tokens
output_tokens = message.usage.output_tokens
# Haiku pricing (April 2026)
HAIKU_INPUT = 0.80 / 1_000_000 # $0.80 per 1M input tokens
HAIKU_OUTPUT = 4.00 / 1_000_000 # $4.00 per 1M output tokens
cost = (input_tokens * HAIKU_INPUT) + (output_tokens * HAIKU_OUTPUT)
print(f"This call cost: ${cost:.6f}")
Prompt caching (50-90% cost reduction)
For repeated system prompts or document analysis, add cache_control to cache the expensive prefix:
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system=[
{
"type": "text",
"text": very_long_document, # Only charged on cache miss
"cache_control": {"type": "ephemeral"}
}
],
messages=[{"role": "user", "content": "Summarize the key points."}]
)
First call writes the cache. Subsequent calls with the same prefix get a 90% discount on those tokens. See the prompt caching cost analysis for the break-even math.
A minimal production wrapper
import anthropic
from typing import Optional
client = anthropic.Anthropic()
def ask_claude(
prompt: str,
system: Optional[str] = None,
model: str = "claude-3-5-haiku-20241022",
max_tokens: int = 1024,
) -> str:
"""Simple wrapper for one-shot Claude calls."""
kwargs = {
"model": model,
"max_tokens": max_tokens,
"messages": [{"role": "user", "content": prompt}],
}
if system:
kwargs["system"] = system
message = client.messages.create(**kwargs)
return message.content[0].text
# Use it
result = ask_claude(
"Extract the sentiment from this review: 'Great product, fast shipping!'",
system="Return only: positive, negative, or neutral."
)
print(result) # positive
Next steps
- Claude API pricing and cost optimization — understand the cost levers
- Prompt caching break-even analysis — when caching pays off
- Streaming guide — send tokens to the browser as they arrive
- Batch API for async workloads — 50% discount for non-real-time jobs
- Error handling in production — retry strategies and error taxonomy
The Claude API Cost Optimization Masterclass covers the full stack: prompt caching, model tiering, batch API, and the exact implementation order to maximize savings.
Frequently Asked Questions
Do I need to install anything other than the anthropic package to get started?
No. pip install anthropic is the only dependency. The SDK handles authentication, retries, and error handling out of the box. You just need a valid API key from console.anthropic.com and an ANTHROPIC_API_KEY environment variable.
What is the difference between client.messages.create() and client.messages.stream()?
messages.create() waits for the full response before returning — best for programmatic tasks where you process the result as a string. messages.stream() yields tokens as they arrive, which is better for chat interfaces or any UX where you want to show text appearing progressively.
How do I handle multi-turn conversations with the Python SDK?
The Claude API is stateless — you must send the full conversation history on every call. Build a list of {"role": "user", "content": "..."} and {"role": "assistant", "content": "..."} dicts, append each new turn, and pass the full list as the messages parameter each time.
Which model should I use by default when starting a new project?
Start with claude-3-5-haiku-20241022 for anything repetitive or high-volume to keep costs low. Use claude-3-5-sonnet-20241022 as your default for tasks requiring better reasoning or instruction following. Only reach for Opus or extended thinking when Sonnet demonstrably falls short on your specific task.
Does the SDK automatically retry on rate limit errors?
Yes. The official Anthropic Python SDK retries on 429 (rate limit) and 529 (overloaded) responses using exponential backoff. The default is 2 retries; you can increase this by passing max_retries=N when creating the client. For production systems, 3–4 retries is a reasonable setting.
Take It Further
Claude API Cost Optimization Masterclass — The practical guide to cutting Claude API costs by 60–90% in production. Model tiering, prompt caching, Batch API, and token compression — with real numbers from 12 production deployments.
120-page PDF + Excel cost calculator.
→ Get Cost Optimization Masterclass — $59
30-day money-back guarantee. Instant download.