Deploying Claude Agents to Production: Fly.io, Vercel, and Lambda
Claude agent deployments fail for three common reasons: timeouts (LLM calls take 5-30 seconds, most serverless platforms time out at 30s), cold starts (agents with heavy initialization are too slow for serverless), and missing environment variables in production. Choosing the right deployment target for your agent type prevents all three. This guide covers the three main deployment patterns with complete configuration.
Deployment Target Decision Matrix
| Agent type | Recommended platform | Why |
|---|---|---|
| Long-running (5+ minutes) | Fly.io | No timeout limits, persistent processes |
| API endpoint (< 30s response) | Vercel | Zero-config, automatic scaling |
| Event-driven (webhooks, queues) | AWS Lambda | Pay-per-invocation, natural event model |
| Streaming responses | Vercel Edge | Low latency, streaming SSE support |
| High-volume, cost-sensitive | Fly.io + Redis queue | Full control, no per-invocation billing |
Fly.io: Long-Running Agents
Best for: agents that run for minutes, background processing, agents that need to hold state in memory.
Project structure
my-agent/
├── Dockerfile
├── fly.toml
├── requirements.txt
└── agent/
├── __init__.py
├── main.py
└── tools.py
Dockerfile
FROM python:3.12-slim
WORKDIR /app
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY agent/ ./agent/
# Health check endpoint
EXPOSE 8080
CMD ["python", "-m", "uvicorn", "agent.main:app", "--host", "0.0.0.0", "--port", "8080"]
FastAPI agent server
# agent/main.py
import os
import asyncio
from fastapi import FastAPI, HTTPException, BackgroundTasks
from pydantic import BaseModel
import anthropic
app = FastAPI()
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
# In-memory job tracker (use Redis in production for multi-instance)
jobs = {}
class AgentRequest(BaseModel):
goal: str
webhook_url: str | None = None
class JobStatus(BaseModel):
job_id: str
status: str # "running" | "done" | "failed"
result: str | None = None
error: str | None = None
@app.get("/health")
async def health():
return {"status": "ok"}
@app.post("/run")
async def run_agent(request: AgentRequest, background_tasks: BackgroundTasks):
import uuid
job_id = str(uuid.uuid4())
jobs[job_id] = {"status": "running", "result": None, "error": None}
background_tasks.add_task(execute_agent_job, job_id, request.goal, request.webhook_url)
return {"job_id": job_id}
@app.get("/status/{job_id}")
async def get_status(job_id: str) -> JobStatus:
if job_id not in jobs:
raise HTTPException(status_code=404, detail="Job not found")
job = jobs[job_id]
return JobStatus(job_id=job_id, **job)
async def execute_agent_job(job_id: str, goal: str, webhook_url: str | None):
"""Background task that runs the full agent loop."""
try:
result = await run_agent_loop(goal)
jobs[job_id] = {"status": "done", "result": result, "error": None}
if webhook_url:
import httpx
async with httpx.AsyncClient() as http:
await http.post(webhook_url, json={"job_id": job_id, "result": result})
except Exception as e:
jobs[job_id] = {"status": "failed", "result": None, "error": str(e)}
async def run_agent_loop(goal: str, max_turns: int = 30) -> str:
messages = [{"role": "user", "content": goal}]
for turn in range(max_turns):
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=2048,
messages=messages
)
messages.append({"role": "assistant", "content": response.content[0].text})
if response.stop_reason == "end_turn":
return response.content[0].text
return "Reached max turns"
fly.toml
app = "my-claude-agent"
primary_region = "nrt" # Tokyo — closest to Korea
[build]
dockerfile = "Dockerfile"
[env]
PORT = "8080"
[http_service]
internal_port = 8080
force_https = true
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 0
[[http_service.checks]]
grace_period = "10s"
interval = "30s"
method = "GET"
path = "/health"
timeout = "5s"
[[vm]]
memory = "1gb"
cpu_kind = "shared"
cpus = 1
Deploy
# Install flyctl
curl -L https://fly.io/install.sh | sh
# Initial deploy
fly launch --name my-claude-agent
# Set secrets
fly secrets set ANTHROPIC_API_KEY=sk-ant-...
# Deploy updates
fly deploy
Vercel: API Endpoints with Streaming
Best for: agents that respond to HTTP requests within 30 seconds, streaming chat responses.
Streaming agent endpoint
// app/api/agent/route.ts (Next.js 15 App Router)
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY!,
});
export const maxDuration = 30; // Vercel Pro: up to 300s
export async function POST(req: Request) {
const { messages, system } = await req.json();
const encoder = new TextEncoder();
const stream = new ReadableStream({
async start(controller) {
try {
const response = await client.messages.create({
model: "claude-sonnet-4-5",
max_tokens: 2048,
system: system || "You are a helpful assistant.",
messages,
stream: true,
});
for await (const event of response) {
if (
event.type === "content_block_delta" &&
event.delta.type === "text_delta"
) {
const chunk = `data: ${JSON.stringify({ text: event.delta.text })}\n\n`;
controller.enqueue(encoder.encode(chunk));
}
if (event.type === "message_stop") {
controller.enqueue(encoder.encode("data: [DONE]\n\n"));
}
}
} catch (error) {
const errorMsg = `data: ${JSON.stringify({ error: String(error) })}\n\n`;
controller.enqueue(encoder.encode(errorMsg));
} finally {
controller.close();
}
},
});
return new Response(stream, {
headers: {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache",
Connection: "keep-alive",
},
});
}
vercel.json for longer timeouts
{
"functions": {
"app/api/agent/route.ts": {
"maxDuration": 300
}
}
}
Environment variables in Vercel
# Via CLI
vercel env add ANTHROPIC_API_KEY production
# Or in vercel.json (non-sensitive only)
{
"env": {
"NODE_ENV": "production",
"AGENT_MAX_TURNS": "20"
}
}
AWS Lambda: Event-Driven Agents
Best for: agents triggered by webhooks, queue messages, S3 uploads, or scheduled events.
Lambda function
# handler.py
import json
import os
import anthropic
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
def handler(event, context):
"""Lambda handler for event-driven agent invocation."""
# Parse trigger source
if "body" in event:
# API Gateway trigger
body = json.loads(event.get("body", "{}"))
task = body.get("task", "")
elif "Records" in event and event["Records"][0].get("EventSource") == "aws:sqs":
# SQS trigger
message = json.loads(event["Records"][0]["body"])
task = message.get("task", "")
else:
task = event.get("task", "")
if not task:
return {"statusCode": 400, "body": json.dumps({"error": "No task provided"})}
try:
result = run_agent(task)
return {
"statusCode": 200,
"body": json.dumps({"result": result}),
"headers": {"Content-Type": "application/json"}
}
except Exception as e:
return {
"statusCode": 500,
"body": json.dumps({"error": str(e)})
}
def run_agent(task: str, max_turns: int = 10) -> str:
"""Compact agent loop for Lambda execution."""
messages = [{"role": "user", "content": task}]
for _ in range(max_turns):
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=messages
)
messages.append({"role": "assistant", "content": response.content[0].text})
if response.stop_reason == "end_turn":
return response.content[0].text
return messages[-1]["content"]
SAM template for Lambda
# template.yaml
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Globals:
Function:
Timeout: 120 # 2 minutes — adjust per your P95 latency
MemorySize: 512
Runtime: python3.12
Environment:
Variables:
ANTHROPIC_API_KEY: !Sub '{{resolve:secretsmanager:anthropic-api-key}}'
Resources:
AgentFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: ./
Handler: handler.handler
Events:
ApiEvent:
Type: Api
Properties:
Path: /run
Method: post
SQSEvent:
Type: SQS
Properties:
Queue: !GetAtt AgentQueue.Arn
BatchSize: 1
AgentQueue:
Type: AWS::SQS::Queue
Properties:
VisibilityTimeout: 180 # Must be >= Lambda timeout
Deploy with SAM
sam build
sam deploy --guided
Environment Variables Across Platforms
The #1 production failure: ANTHROPIC_API_KEY not set in production.
# Verify in each environment
python -c "import os; print('API key set:', bool(os.environ.get('ANTHROPIC_API_KEY')))"
Fly.io: fly secrets set ANTHROPIC_API_KEY=sk-ant-...
Vercel: Settings → Environment Variables → Add for Production
Lambda: AWS Systems Manager Parameter Store or Secrets Manager
Never put API keys in:
DockerfileENV instructionsfly.toml[env] sectionvercel.jsonenv object- Git repositories
Frequently Asked Questions
What's the best platform for a beginner deploying their first Claude agent?
Vercel. Zero configuration, instant deploys with vercel --prod, free tier generous enough for testing. The main limitation is the 30-second (free) / 300-second (Pro) function timeout.
Can I use Fly.io for free? Fly.io has a free allowance: 3 shared-CPU VMs, 256MB RAM each. Enough for a low-traffic personal agent. Production agents with any traffic need the paid tier (~$5-15/month for a 1GB instance).
How do I handle Anthropic rate limits (429 errors) in production? Implement exponential backoff with jitter (see the error handling guide). For Lambda: set the SQS visibility timeout to 3x your max agent runtime to allow retry without duplicate processing. For Fly.io: use a Redis queue with delayed retry.
Should agents be synchronous (return result) or asynchronous (return job ID)? Async (job ID pattern) for anything that might take > 10 seconds. Sync is fine for quick tasks and streaming responses. The Fly.io example above shows the async pattern.
How do I monitor production agent costs? The observability guide covers structured logging with token counts. For per-deployment tracking, tag your API calls with a metadata header (or use the logging wrapper) and aggregate by deployment environment in your cost dashboard.
Related Guides
- Claude Agent Observability — Logging and cost tracking
- How to Handle Errors and Retries in Claude Agent SDK — Retry patterns
- Agentic Workflows: The Next Frontier — Architecture patterns
Go Deeper
Agent SDK Cookbook — $49 — Full production deployment templates: Fly.io with Redis queue, Vercel streaming with auth, Lambda with SQS trigger, blue/green deployment strategy, and rollback procedures.
→ Get the Agent SDK Cookbook — $49
30-day money-back guarantee. Instant download.