Claude Python SDK Tutorial: Complete Setup and Usage Guide (2026)
The Claude Python SDK (anthropic) lets you call Claude's API in Python with a clean, typed interface — install with pip install anthropic, set your ANTHROPIC_API_KEY, and you're sending messages in under 10 lines of code. This tutorial covers everything from installation to advanced features: streaming, tool use, prompt caching, and multi-turn conversations. All code examples are tested against the current API.
Installation and Authentication
pip install anthropic
Set your API key as an environment variable:
export ANTHROPIC_API_KEY="sk-ant-..."
Or load it in Python via .env:
pip install python-dotenv
from dotenv import load_dotenv
import os
load_dotenv()
api_key = os.getenv("ANTHROPIC_API_KEY")
Your First API Call
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain prompt caching in one paragraph."}
]
)
print(message.content[0].text)
This is the minimal working example. The client reads ANTHROPIC_API_KEY from your environment automatically. No additional configuration needed.
Understanding the Response Object
print(message.id) # msg_01XFDUDYJgAACzvnptvVoYEL
print(message.model) # claude-sonnet-4-5
print(message.stop_reason) # end_turn
print(message.usage.input_tokens) # 23
print(message.usage.output_tokens) # 120
print(message.content[0].text) # The actual response text
message.usage is critical for cost tracking. Input tokens × model price + output tokens × model price = your cost per call. See Claude API Cost and Prompt Caching Break-Even for the full pricing breakdown.
System Prompts
System prompts set the persona and behavior for the entire conversation:
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
system="You are a senior Python developer. Answer concisely with code examples.",
messages=[
{"role": "user", "content": "How do I handle rate limit errors?"}
]
)
Multi-Turn Conversations
Build conversation history by appending messages:
conversation = []
def chat(user_message):
conversation.append({"role": "user", "content": user_message})
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=conversation
)
assistant_message = response.content[0].text
conversation.append({"role": "assistant", "content": assistant_message})
return assistant_message
# Multi-turn exchange
print(chat("What is the difference between a list and a tuple in Python?"))
print(chat("When would you choose a tuple over a list?"))
print(chat("Give me a code example with both."))
The conversation list acts as the full history. Claude has no persistent memory — you manage state client-side.
Streaming Responses
For long outputs or interactive UIs, use streaming to display text as it arrives:
with client.messages.stream(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a detailed explanation of async/await in Python."}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Streaming benchmark: First token typically arrives in 300-500ms for Sonnet. Without streaming, you wait for the full response (can be 5-15 seconds for long outputs). For any user-facing application, streaming is strongly recommended.
Build production-ready Claude integrations
Agent SDK Cookbook ($49) includes 30+ Python recipes: streaming, tool use, multi-agent pipelines, error handling, and cost optimization patterns.
Tool Use (Function Calling)
Tool use lets Claude call your Python functions when it needs to:
tools = [
{
"name": "get_stock_price",
"description": "Get the current stock price for a given ticker symbol",
"input_schema": {
"type": "object",
"properties": {
"ticker": {
"type": "string",
"description": "Stock ticker symbol, e.g. AAPL"
}
},
"required": ["ticker"]
}
}
]
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
tools=tools,
messages=[{"role": "user", "content": "What's the current price of Apple stock?"}]
)
# Check if Claude wants to call a tool
if response.stop_reason == "tool_use":
tool_call = next(b for b in response.content if b.type == "tool_use")
print(f"Claude wants to call: {tool_call.name}")
print(f"With inputs: {tool_call.input}")
# Call your actual function
result = get_stock_price(tool_call.input["ticker"]) # your function
# Return the result to Claude
final_response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "What's the current price of Apple stock?"},
{"role": "assistant", "content": response.content},
{
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_call.id,
"content": str(result)
}]
}
]
)
print(final_response.content[0].text)
For a deeper dive on tool use patterns, see Claude Agent SDK Guide.
Prompt Caching
Prompt caching dramatically reduces cost when you reuse large system prompts or document context:
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a code review assistant. [... large instructions ...]",
"cache_control": {"type": "ephemeral"}
}
],
messages=[{"role": "user", "content": "Review this function: [code]"}]
)
# Check cache status
print(response.usage.cache_creation_input_tokens) # First call: tokens written to cache
print(response.usage.cache_read_input_tokens) # Subsequent calls: tokens read from cache
Cached tokens cost 90% less than standard input tokens. For a system prompt over 2,000 tokens, caching pays for itself after 1 call.
Error Handling
import anthropic
from anthropic import APIError, RateLimitError, APIConnectionError
client = anthropic.Anthropic()
def call_claude_safely(prompt: str, retries: int = 3):
for attempt in range(retries):
try:
return client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
)
except RateLimitError:
if attempt < retries - 1:
import time
time.sleep(2 ** attempt) # Exponential backoff
else:
raise
except APIConnectionError as e:
print(f"Connection error: {e}")
raise
except APIError as e:
print(f"API error {e.status_code}: {e.message}")
raise
Async Usage
For async applications (FastAPI, async scripts):
import asyncio
import anthropic
async def main():
client = anthropic.AsyncAnthropic()
message = await client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello, Claude!"}]
)
print(message.content[0].text)
asyncio.run(main())
Use AsyncAnthropic for any async context. The interface is identical to the sync client, with await added.
Counting Tokens Before Calling
Estimate cost before sending a large prompt:
response = client.messages.count_tokens(
model="claude-sonnet-4-5",
system="You are a helpful assistant.",
messages=[{"role": "user", "content": "Your large prompt here..."}]
)
print(f"Input tokens: {response.input_tokens}")
This uses the same tokenizer as the actual API call, so the count is accurate.
Model Selection
# For most tasks — best balance of capability and cost
model = "claude-sonnet-4-5"
# For simple, high-volume tasks (10x cheaper than Sonnet)
model = "claude-haiku-4-5"
# For complex reasoning (3x more expensive than Sonnet)
model = "claude-opus-4-5"
See Claude Haiku vs Sonnet vs Opus: Which Model to Use for the full comparison with cost benchmarks.
Frequently Asked Questions
How do I install the Claude Python SDK?
Run pip install anthropic. Then set your API key: export ANTHROPIC_API_KEY="sk-ant-...". You can also pass the key directly: client = anthropic.Anthropic(api_key="sk-ant-..."), but environment variables are preferred for security.
What Python version does the Anthropic SDK support?
The anthropic package supports Python 3.7 and above. Python 3.9+ is recommended for full type hint support and best performance.
How do I handle long documents that exceed the context window?
For documents that exceed 200K tokens, split them into chunks and process sequentially, or use prompt caching with the cache_control parameter to cache a shared context. The 1M token context window is available on Claude 3.5 models — see the Claude 1M Context Window guide for details.
Is the Python SDK thread-safe?
The synchronous Anthropic client is thread-safe. You can share a single client instance across threads. For async applications, use AsyncAnthropic and await each call.
How do I track API costs in Python?
Use response.usage.input_tokens and response.usage.output_tokens from each response. Multiply by the per-token price for your model. Log these to a database or monitoring service. The Claude API Cost guide has current per-token prices.
What's the difference between stream=True and client.messages.stream()?
client.messages.stream() is the recommended streaming interface — it returns a context manager with a text_stream iterator. The lower-level stream=True parameter on create() returns raw Server-Sent Events. Use client.messages.stream() for most use cases.
30+ Python recipes for Claude API
Agent SDK Cookbook ($49) goes deep on production patterns: streaming pipelines, multi-agent coordination, tool use chains, and error recovery.