Build an MCP Server in Python with FastAPI: Tools, Resources, Auth (2026)

Q: How do I version my server?

Two layers. The version argument to FastMCP("name", version="1.4.0") is advertised in the initialize handshake — bump on tool-shape changes. Separately, deploy URL paths (/mcp/v1, /mcp/v2) when you make breaking schema changes; let old clients stay on v1 while new clients adopt v2. Never silently change a tool's argument schema in place.

MCP servers expose tools (callable functions) and resources (readable context) to Claude. The cleanest 2026 stack is the official mcp Python SDK on top of FastAPI: you get streaming HTTP transport, Pydantic-typed tool schemas, bearer-token auth, and a single ASGI app you can ship to Fly.io, Render, or a Mac mini. This guide walks the full path — from pip install mcp to a production server wired into Claude Code via claude mcp add — with working code for tools, streaming resources, and OAuth-style authentication. Tested against MCP spec 2025-06-18 and mcp-python-sdk v1.x.

What is MCP, and when do you need a custom server?

The Model Context Protocol is a JSON-RPC contract Claude clients (Claude Code, claude.ai desktop, the API harness) speak to external programs. The program advertises tools and resources; the client routes user intent to those tools. You need a custom MCP server when the integration you want isn't already in the public registry — internal databases, proprietary APIs, custom workflows over your own filesystem. If a vendor MCP exists (GitHub, Linear, Stripe, Postgres), use that first; build only what's missing.

Stack choice: `mcp` SDK vs raw FastAPI vs HTTPX-only

Three plausible stacks, picked in order of escape hatch:

mcp SDK + FastAPI (recommended). The SDK's FastMCP wraps schema generation, transport negotiation (stdio + streaming HTTP), and JSON-RPC framing. Mounting it inside FastAPI gives you middleware, OpenAPI co-existence, and ASGI deploys.
Raw FastAPI, hand-rolling JSON-RPC. Only do this if you're forking the protocol — the spec is moving, and you'll spend a week catching up to each release.
HTTPX-only client wrappers are not servers; they belong inside a tool, not as a replacement.

Pick FastMCP. The escape hatch is always "add a plain FastAPI route alongside" — both can mount on the same Uvicorn worker.

Minimal working server

Install the SDK and FastAPI:

pip install "mcp[cli]>=1.2" fastapi uvicorn[standard] httpx pydantic

Then a single file gives you a real server:

# server.py
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("acme-tools", version="0.1.0")

@mcp.tool()
def ping(message: str = "hello") -> str:
    """Echo a message back. Useful as a smoke test."""
    return f"pong: {message}"

if __name__ == "__main__":
    # streaming HTTP transport on :8000/mcp
    mcp.run(transport="streamable-http", host="0.0.0.0", port=8000)

That's a complete server. The @mcp.tool() decorator introspects the function signature, generates a JSON schema for arguments, and registers the tool's docstring as its description — Claude reads both when deciding whether to call it.

Tool with structured arguments (Pydantic models)

Real tools rarely take a single string. Pydantic models give you validation and self-documenting schemas in one shot:

# tools/issue.py
from pydantic import BaseModel, Field
from typing import Literal
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("acme-tools")

class CreateIssueArgs(BaseModel):
    title: str = Field(..., min_length=3, max_length=200)
    body: str = Field("", description="Markdown body. Optional.")
    severity: Literal["low", "medium", "high", "critical"] = "medium"
    assignee: str | None = Field(None, pattern=r"^[a-z0-9-]+$")

@mcp.tool()
def create_issue(args: CreateIssueArgs) -> dict:
    """Create an issue in the internal tracker."""
    # call your tracker here
    return {
        "id": "ISS-1042",
        "url": f"https://tracker.internal/iss/1042",
        "severity": args.severity,
    }

Two practical notes: keep tool descriptions concrete ("creates an issue in the internal tracker"; not "manages issues"), and prefer Literal[...] over free-form strings — Claude picks valid enums far more reliably than it parses prose constraints.

Mid-article aside. Tool sprawl is the biggest hidden cost on a self-hosted MCP — every tool description is tokens on every turn. The Cost Optimization Masterclass walks through prompt-caching, tool pruning, and the 80/15/5 model routing rule that took our own server from $0.34/session to under $0.06.

Streaming resource: a GitHub repo file listing

Resources are read-only context Claude can pull in without invoking a tool. Streaming matters when the payload is large — Claude can start reasoning before the body finishes:

# resources/repo.py
import httpx
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("acme-tools")

@mcp.resource("repo://{owner}/{name}/tree")
async def repo_tree(owner: str, name: str) -> str:
    """Stream a flat file listing for github.com/{owner}/{name}."""
    url = f"https://api.github.com/repos/{owner}/{name}/git/trees/HEAD?recursive=1"
    async with httpx.AsyncClient(timeout=30) as client:
        async with client.stream("GET", url) as r:
            r.raise_for_status()
            chunks: list[str] = []
            async for line in r.aiter_lines():
                chunks.append(line)
            return "\n".join(chunks)

Resource URIs are addressable — Claude (or you) can pin specific URIs in a session, and the SDK handles caching. The {owner}/{name} placeholders become required parameters automatically.

Authentication: bearer token + dynamic-client OAuth

A public MCP endpoint without auth is a footgun. The minimum viable layer is a bearer-token middleware; the production-grade layer is dynamic-client OAuth via authlib so each Claude installation provisions its own credentials.

# auth.py
import os, secrets
from fastapi import FastAPI, Request, HTTPException
from starlette.middleware.base import BaseHTTPMiddleware
from mcp.server.fastmcp import FastMCP

VALID_TOKENS = set(os.environ.get("MCP_BEARER_TOKENS", "").split(","))

class BearerAuth(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next):
        if request.url.path.startswith("/mcp"):
            header = request.headers.get("authorization", "")
            if not header.startswith("Bearer "):
                raise HTTPException(401, "missing bearer token")
            token = header[7:].strip()
            if not secrets.compare_digest(token, "") and token not in VALID_TOKENS:
                raise HTTPException(403, "invalid token")
        return await call_next(request)

mcp = FastMCP("acme-tools")
app: FastAPI = mcp.streamable_http_app()  # underlying ASGI app
app.add_middleware(BearerAuth)

For OAuth proper, layer authlib.integrations.starlette_client.OAuth on top, expose /.well-known/oauth-authorization-server, and let the SDK negotiate. The MCP spec's auth profile is a thin wrapper over RFC 7591 dynamic client registration — authlib covers the heavy lifting. Keep the bearer path as a fallback for CI and headless agents.

Local testing with `mcp-inspector`

Before wiring into Claude, smoke-test with the official inspector:

npx @modelcontextprotocol/inspector \
  --transport streamable-http \
  --url http://localhost:8000/mcp \
  --header "Authorization: Bearer dev-token"

The inspector renders every advertised tool, lets you fire calls with arbitrary arguments, and shows raw JSON-RPC frames. If a tool's schema looks wrong here, it'll look wrong to Claude too — fix it before deploying.

Deploying: Docker + Fly.io

A two-file deploy gets you a real URL:

# Dockerfile
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
ENV PORT=8000
CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "8000"]

# fly.toml
app = "acme-mcp"
primary_region = "nrt"

[http_service]
  internal_port = 8000
  force_https = true
  auto_stop_machines = "stop"
  min_machines_running = 0

[[http_service.checks]]
  path = "/healthz"
  interval = "30s"

Set secrets with fly secrets set MCP_BEARER_TOKENS=.... auto_stop_machines is fine for low-traffic internal servers; flip to min_machines_running = 1 once latency matters. Single-region is correct until you have a reason — MCP traffic is request/response, not real-time.

Wiring into Claude Code

Once the server is up and the bearer token is set, register it from the Claude Code CLI:

claude mcp add acme-tools \
  --transport http \
  --url https://acme-mcp.fly.dev/mcp \
  --header "Authorization: Bearer $MCP_TOKEN"

This writes to your user settings.json. For project-scoped registration, edit .claude/settings.json directly:

{
  "mcpServers": {
    "acme-tools": {
      "transport": "http",
      "url": "https://acme-mcp.fly.dev/mcp",
      "headers": { "Authorization": "Bearer ${MCP_TOKEN}" }
    }
  }
}

Claude Code resolves ${MCP_TOKEN} from the shell environment — never paste raw secrets into committed JSON.

Error handling: tool errors, request_id correlation

Tools should raise typed exceptions, not return error strings. The SDK turns raised ValueError and friends into structured isError: true responses Claude will reason about:

from mcp.server.fastmcp.exceptions import ToolError

@mcp.tool()
def fetch_user(user_id: str) -> dict:
    if not user_id.isalnum():
        raise ToolError(f"invalid user_id: {user_id!r}")
    ...

For request correlation across services, log the JSON-RPC id field — it round-trips through every frame. A FastAPI middleware that pulls request.state.mcp_request_id into your logger gives you per-call traces in Sentry or Loki without inventing your own scheme.

Performance: connection limits, async, caching

Three guardrails worth setting on day one:

Async everywhere downstream. A single sync httpx.get inside a tool blocks the worker for every other in-flight call. Use httpx.AsyncClient and await it.
Cap upstream concurrency. A shared asyncio.Semaphore(8) around outbound API calls prevents a chatty Claude session from torching your rate limit.
Add Redis caching when, and only when, you measure repeat calls. Resource fetches (file trees, schema introspection) are the high-hit-rate candidates; tool calls usually aren't. Premature caching hides bugs.

Uvicorn defaults to one worker — fine for a low-traffic internal server, raise to --workers 4 once you see queueing in logs.

Frequently Asked Questions

Stdio or HTTP transport?

Stdio is right for a server that runs as a subprocess of one client (Claude Desktop launching a local Python script). HTTP — specifically streamable-http — is right for anything multi-user, anything deployed, and anything you want to test with mcp-inspector over the network. If in doubt, ship HTTP; you can always wrap stdio later.

How do I authenticate users?

Bearer tokens for service accounts and CI. OAuth (RFC 7591 dynamic client registration, as profiled in the MCP spec) for end users — authlib plus the SDK's auth helpers handles registration, refresh, and scope. Don't invent a custom header scheme; clients won't speak it.

Can MCP servers call external APIs?

Yes — that's the most common shape. A tool is just a Python function; call any REST/GraphQL/database client you like inside it. Keep network calls async, time them out aggressively (10–30s), and surface upstream errors as ToolError with the upstream status code in the message.

What's the rate limit?

There is no protocol-level rate limit — you set it. Claude clients respect server-side 429s, so return them honestly: a Retry-After header on a starlette.responses.Response with status 429 is enough. For per-tool quotas, gate with asyncio.Semaphore plus a token bucket (e.g. aiolimiter).

How do I version my server?

Two layers. The version argument to FastMCP("name", version="1.4.0") is advertised in the initialize handshake — bump on tool-shape changes. Separately, deploy URL paths (/mcp/v1, /mcp/v2) when you make breaking schema changes; let old clients stay on v1 while new clients adopt v2. Never silently change a tool's argument schema in place.

Where to go next

Claude MCP best practices — patterns for tool design, prompt budgeting, and observability
Claude Code MCP (Korean track) — same material, Korean-language deep dive
Claude API error handling — turn upstream failures into tool errors Claude can recover from

A working MCP server is a 50-line file. A production MCP server is the same 50 lines plus auth, error correlation, async hygiene, and a versioned URL — none of which are hard, but all of which need to be there on day one. Start with FastMCP, ship the bearer path first, layer OAuth when real users arrive, and treat tool descriptions as the single highest-leverage prompt-engineering surface in your stack.