← All guides

Deploying Claude Agents to Production: Fly.io, Vercel, and Lambda

How to deploy Claude agents to production — Fly.io for long-running agents, Vercel for API routes with streaming, AWS Lambda for event-driven agents.

Deploying Claude Agents to Production: Fly.io, Vercel, and Lambda

Claude agent deployments fail for three common reasons: timeouts (LLM calls take 5-30 seconds, most serverless platforms time out at 30s), cold starts (agents with heavy initialization are too slow for serverless), and missing environment variables in production. Choosing the right deployment target for your agent type prevents all three. This guide covers the three main deployment patterns with complete configuration.


Deployment Target Decision Matrix

Agent type Recommended platform Why
Long-running (5+ minutes) Fly.io No timeout limits, persistent processes
API endpoint (< 30s response) Vercel Zero-config, automatic scaling
Event-driven (webhooks, queues) AWS Lambda Pay-per-invocation, natural event model
Streaming responses Vercel Edge Low latency, streaming SSE support
High-volume, cost-sensitive Fly.io + Redis queue Full control, no per-invocation billing

Fly.io: Long-Running Agents

Best for: agents that run for minutes, background processing, agents that need to hold state in memory.

Project structure

my-agent/
├── Dockerfile
├── fly.toml
├── requirements.txt
└── agent/
    ├── __init__.py
    ├── main.py
    └── tools.py

Dockerfile

FROM python:3.12-slim

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
    curl \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY agent/ ./agent/

# Health check endpoint
EXPOSE 8080

CMD ["python", "-m", "uvicorn", "agent.main:app", "--host", "0.0.0.0", "--port", "8080"]

FastAPI agent server

# agent/main.py
import os
import asyncio
from fastapi import FastAPI, HTTPException, BackgroundTasks
from pydantic import BaseModel
import anthropic

app = FastAPI()
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

# In-memory job tracker (use Redis in production for multi-instance)
jobs = {}


class AgentRequest(BaseModel):
    goal: str
    webhook_url: str | None = None


class JobStatus(BaseModel):
    job_id: str
    status: str  # "running" | "done" | "failed"
    result: str | None = None
    error: str | None = None


@app.get("/health")
async def health():
    return {"status": "ok"}


@app.post("/run")
async def run_agent(request: AgentRequest, background_tasks: BackgroundTasks):
    import uuid
    job_id = str(uuid.uuid4())
    jobs[job_id] = {"status": "running", "result": None, "error": None}

    background_tasks.add_task(execute_agent_job, job_id, request.goal, request.webhook_url)
    return {"job_id": job_id}


@app.get("/status/{job_id}")
async def get_status(job_id: str) -> JobStatus:
    if job_id not in jobs:
        raise HTTPException(status_code=404, detail="Job not found")
    job = jobs[job_id]
    return JobStatus(job_id=job_id, **job)


async def execute_agent_job(job_id: str, goal: str, webhook_url: str | None):
    """Background task that runs the full agent loop."""
    try:
        result = await run_agent_loop(goal)
        jobs[job_id] = {"status": "done", "result": result, "error": None}

        if webhook_url:
            import httpx
            async with httpx.AsyncClient() as http:
                await http.post(webhook_url, json={"job_id": job_id, "result": result})

    except Exception as e:
        jobs[job_id] = {"status": "failed", "result": None, "error": str(e)}


async def run_agent_loop(goal: str, max_turns: int = 30) -> str:
    messages = [{"role": "user", "content": goal}]

    for turn in range(max_turns):
        response = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=2048,
            messages=messages
        )
        messages.append({"role": "assistant", "content": response.content[0].text})

        if response.stop_reason == "end_turn":
            return response.content[0].text

    return "Reached max turns"

fly.toml

app = "my-claude-agent"
primary_region = "nrt"  # Tokyo — closest to Korea

[build]
  dockerfile = "Dockerfile"

[env]
  PORT = "8080"

[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 0

  [[http_service.checks]]
    grace_period = "10s"
    interval = "30s"
    method = "GET"
    path = "/health"
    timeout = "5s"

[[vm]]
  memory = "1gb"
  cpu_kind = "shared"
  cpus = 1

Deploy

# Install flyctl
curl -L https://fly.io/install.sh | sh

# Initial deploy
fly launch --name my-claude-agent

# Set secrets
fly secrets set ANTHROPIC_API_KEY=sk-ant-...

# Deploy updates
fly deploy

Vercel: API Endpoints with Streaming

Best for: agents that respond to HTTP requests within 30 seconds, streaming chat responses.

Streaming agent endpoint

// app/api/agent/route.ts (Next.js 15 App Router)
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY!,
});

export const maxDuration = 30; // Vercel Pro: up to 300s

export async function POST(req: Request) {
  const { messages, system } = await req.json();

  const encoder = new TextEncoder();

  const stream = new ReadableStream({
    async start(controller) {
      try {
        const response = await client.messages.create({
          model: "claude-sonnet-4-5",
          max_tokens: 2048,
          system: system || "You are a helpful assistant.",
          messages,
          stream: true,
        });

        for await (const event of response) {
          if (
            event.type === "content_block_delta" &&
            event.delta.type === "text_delta"
          ) {
            const chunk = `data: ${JSON.stringify({ text: event.delta.text })}\n\n`;
            controller.enqueue(encoder.encode(chunk));
          }

          if (event.type === "message_stop") {
            controller.enqueue(encoder.encode("data: [DONE]\n\n"));
          }
        }
      } catch (error) {
        const errorMsg = `data: ${JSON.stringify({ error: String(error) })}\n\n`;
        controller.enqueue(encoder.encode(errorMsg));
      } finally {
        controller.close();
      }
    },
  });

  return new Response(stream, {
    headers: {
      "Content-Type": "text/event-stream",
      "Cache-Control": "no-cache",
      Connection: "keep-alive",
    },
  });
}

vercel.json for longer timeouts

{
  "functions": {
    "app/api/agent/route.ts": {
      "maxDuration": 300
    }
  }
}

Environment variables in Vercel

# Via CLI
vercel env add ANTHROPIC_API_KEY production

# Or in vercel.json (non-sensitive only)
{
  "env": {
    "NODE_ENV": "production",
    "AGENT_MAX_TURNS": "20"
  }
}

AWS Lambda: Event-Driven Agents

Best for: agents triggered by webhooks, queue messages, S3 uploads, or scheduled events.

Lambda function

# handler.py
import json
import os
import anthropic

client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])


def handler(event, context):
    """Lambda handler for event-driven agent invocation."""

    # Parse trigger source
    if "body" in event:
        # API Gateway trigger
        body = json.loads(event.get("body", "{}"))
        task = body.get("task", "")
    elif "Records" in event and event["Records"][0].get("EventSource") == "aws:sqs":
        # SQS trigger
        message = json.loads(event["Records"][0]["body"])
        task = message.get("task", "")
    else:
        task = event.get("task", "")

    if not task:
        return {"statusCode": 400, "body": json.dumps({"error": "No task provided"})}

    try:
        result = run_agent(task)
        return {
            "statusCode": 200,
            "body": json.dumps({"result": result}),
            "headers": {"Content-Type": "application/json"}
        }
    except Exception as e:
        return {
            "statusCode": 500,
            "body": json.dumps({"error": str(e)})
        }


def run_agent(task: str, max_turns: int = 10) -> str:
    """Compact agent loop for Lambda execution."""
    messages = [{"role": "user", "content": task}]

    for _ in range(max_turns):
        response = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=1024,
            messages=messages
        )

        messages.append({"role": "assistant", "content": response.content[0].text})

        if response.stop_reason == "end_turn":
            return response.content[0].text

    return messages[-1]["content"]

SAM template for Lambda

# template.yaml
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Globals:
  Function:
    Timeout: 120  # 2 minutes — adjust per your P95 latency
    MemorySize: 512
    Runtime: python3.12
    Environment:
      Variables:
        ANTHROPIC_API_KEY: !Sub '{{resolve:secretsmanager:anthropic-api-key}}'

Resources:
  AgentFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: ./
      Handler: handler.handler
      Events:
        ApiEvent:
          Type: Api
          Properties:
            Path: /run
            Method: post
        SQSEvent:
          Type: SQS
          Properties:
            Queue: !GetAtt AgentQueue.Arn
            BatchSize: 1

  AgentQueue:
    Type: AWS::SQS::Queue
    Properties:
      VisibilityTimeout: 180  # Must be >= Lambda timeout

Deploy with SAM

sam build
sam deploy --guided

Environment Variables Across Platforms

The #1 production failure: ANTHROPIC_API_KEY not set in production.

# Verify in each environment
python -c "import os; print('API key set:', bool(os.environ.get('ANTHROPIC_API_KEY')))"

Fly.io: fly secrets set ANTHROPIC_API_KEY=sk-ant-... Vercel: Settings → Environment Variables → Add for Production Lambda: AWS Systems Manager Parameter Store or Secrets Manager

Never put API keys in:


Frequently Asked Questions

What's the best platform for a beginner deploying their first Claude agent? Vercel. Zero configuration, instant deploys with vercel --prod, free tier generous enough for testing. The main limitation is the 30-second (free) / 300-second (Pro) function timeout.

Can I use Fly.io for free? Fly.io has a free allowance: 3 shared-CPU VMs, 256MB RAM each. Enough for a low-traffic personal agent. Production agents with any traffic need the paid tier (~$5-15/month for a 1GB instance).

How do I handle Anthropic rate limits (429 errors) in production? Implement exponential backoff with jitter (see the error handling guide). For Lambda: set the SQS visibility timeout to 3x your max agent runtime to allow retry without duplicate processing. For Fly.io: use a Redis queue with delayed retry.

Should agents be synchronous (return result) or asynchronous (return job ID)? Async (job ID pattern) for anything that might take > 10 seconds. Sync is fine for quick tasks and streaming responses. The Fly.io example above shows the async pattern.

How do I monitor production agent costs? The observability guide covers structured logging with token counts. For per-deployment tracking, tag your API calls with a metadata header (or use the logging wrapper) and aggregate by deployment environment in your cost dashboard.


Related Guides


Go Deeper

Agent SDK Cookbook — $49 — Full production deployment templates: Fly.io with Redis queue, Vercel streaming with auth, Lambda with SQS trigger, blue/green deployment strategy, and rollback procedures.

→ Get the Agent SDK Cookbook — $49

30-day money-back guarantee. Instant download.

AI Disclosure: Written with Claude Code; deployment patterns from production agent workloads.

Tools and references