Claude API JSON Mode: Force JSON Output Every Time (2026)

Q: What model should I use for JSON extraction?

claude-haiku-3-5 for high-volume batch extraction where the schema is simple. claude-sonnet-4-5 for complex nested schemas or when accuracy matters. claude-opus-4 is rarely needed for JSON extraction — Sonnet is already excellent at this.

Q: How do I handle very large JSON outputs?

Increase max_tokens proportionally. A JSON object with 50 fields typically needs 500-1000 tokens. For streaming large JSON, use the streaming API but buffer until stop_reason: "end_turn" before parsing — partial JSON is not valid JSON.

Claude doesn't have a dedicated "JSON mode" endpoint like OpenAI, but you can achieve near-100% reliable JSON output using three techniques: system prompt instruction, response prefilling, and tool use enforcement. For claude-sonnet-4-5, the system prompt approach alone achieves 97%+ JSON compliance — add prefilling and you're at 99.5%.

Technique 1: System Prompt Instruction (Simplest)

The fastest path to JSON output:

import anthropic
import json

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    system="""You are a data extraction API. Always respond with valid JSON only.
No explanatory text, no markdown code fences, no prefixes.
Your entire response must be parseable by json.loads().""",
    messages=[{
        "role": "user",
        "content": "Extract: name, email, company from: 'Hi, I'm Sarah Chen, sarah@acme.io, Acme Corp'"
    }]
)

data = json.loads(response.content[0].text)
# {"name": "Sarah Chen", "email": "sarah@acme.io", "company": "Acme Corp"}

Works 97%+ of the time with claude-sonnet-4-5. The remaining 3% happens when Claude adds a brief explanation before the JSON. Fix that with prefilling.

Technique 2: Response Prefilling (Most Reliable)

Force JSON by starting the assistant's response with {:

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    system="Extract structured data as JSON. Include all fields you find.",
    messages=[
        {"role": "user", "content": "Parse: Alice Johnson, Head of Engineering, alice@startup.io"},
        {"role": "assistant", "content": "{"}  # Prefill — Claude continues from here
    ]
)

# Response will be the JSON body (without the opening {)
raw = "{" + response.content[0].text
data = json.loads(raw)

When you prefill with {, Claude must continue with valid JSON. It cannot prepend explanation text because the response already started. This brings compliance to 99.5%+.

Technique 3: Tool Use Enforcement (Best for Schemas)

Use tool use to enforce a specific JSON schema. Claude cannot return non-JSON when a tool call is required:

tools = [{
    "name": "extract_contact",
    "description": "Extract contact information from text",
    "input_schema": {
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "email": {"type": "string", "format": "email"},
            "company": {"type": "string"},
            "role": {"type": "string"}
        },
        "required": ["name"]
    }
}]

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    tools=tools,
    tool_choice={"type": "tool", "name": "extract_contact"},  # Force this tool
    messages=[{"role": "user", "content": "Alice Johnson, Head of Engineering at Startup Inc"}]
)

# Get structured data from tool call
tool_block = next(b for b in response.content if b.type == "tool_use")
data = tool_block.input  # Already a dict, no json.loads() needed
# {"name": "Alice Johnson", "role": "Head of Engineering", "company": "Startup Inc"}

Tool use with tool_choice: {type: "tool", name: "..."} achieves 100% schema compliance — Claude literally cannot return anything except valid tool arguments.

Choosing the Right Technique

Technique	Compliance	Schema Control	Complexity	Cost
System prompt	97%	Loose	Low	Base
+ Prefilling	99.5%	Loose	Low	Base
Tool use	100%	Strict	Medium	+~15% tokens

Use system prompt + prefill when you need quick JSON without caring about exact schema enforcement.

Use tool use when you need strict schema validation, required fields, and type checking.

Production-Grade JSON Extraction

Combining all techniques with error handling:

import anthropic
import json
import re
from typing import TypeVar, Type
from pydantic import BaseModel

client = anthropic.Anthropic()
T = TypeVar('T', bound=BaseModel)

def extract_json(
    prompt: str,
    schema: Type[T],
    *,
    model: str = "claude-sonnet-4-5"
) -> T:
    """
    Extract structured data using tool use enforcement.
    Returns validated Pydantic model.
    """
    tool_def = {
        "name": "structured_output",
        "description": "Return the extracted data in structured format",
        "input_schema": schema.model_json_schema()
    }
    
    response = client.messages.create(
        model=model,
        max_tokens=2048,
        tools=[tool_def],
        tool_choice={"type": "tool", "name": "structured_output"},
        messages=[{"role": "user", "content": prompt}]
    )
    
    tool_block = next(
        (b for b in response.content if b.type == "tool_use"),
        None
    )
    
    if not tool_block:
        raise ValueError(f"No tool call in response. Content: {response.content}")
    
    return schema.model_validate(tool_block.input)


# Example usage
from pydantic import BaseModel, Field
from typing import List, Optional

class Invoice(BaseModel):
    invoice_number: str
    vendor: str
    amount_usd: float
    line_items: List[str]
    due_date: Optional[str] = None

invoice_text = """
Invoice #INV-2024-1847
From: Acme Corp
Total: $4,250.00
Items: Cloud hosting ($2,000), Support ($1,500), Setup ($750)
Due: 2024-03-15
"""

invoice = extract_json(invoice_text, Invoice)
print(f"Invoice: {invoice.invoice_number}")
print(f"Amount: ${invoice.amount_usd:,.2f}")
print(f"Items: {invoice.line_items}")

Handling Malformed Responses

Even with prefilling, raw text sometimes slips through. A robust parser:

def parse_json_response(text: str) -> dict:
    """
    Parse JSON from Claude response, handling common wrapping patterns.
    Priority: direct parse → strip fences → extract first object/array
    """
    text = text.strip()
    
    # 1. Direct parse (ideal case)
    try:
        return json.loads(text)
    except json.JSONDecodeError:
        pass
    
    # 2. Strip markdown code fences
    fence_match = re.search(r'```(?:json)?\s*([\s\S]+?)\s*```', text)
    if fence_match:
        try:
            return json.loads(fence_match.group(1))
        except json.JSONDecodeError:
            pass
    
    # 3. Extract first JSON object or array
    obj_match = re.search(r'(\{[\s\S]+\}|\[[\s\S]+\])', text)
    if obj_match:
        try:
            return json.loads(obj_match.group(1))
        except json.JSONDecodeError:
            pass
    
    raise ValueError(f"Cannot extract JSON from response: {text[:200]}")

Prompt Caching for Repeated Schemas

If you're running many JSON extractions with the same schema, cache the system prompt:

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    system=[{
        "type": "text",
        "text": f"""Extract data matching this JSON schema exactly:
{json.dumps(schema.model_json_schema(), indent=2)}
Return only the JSON object. No other text.""",
        "cache_control": {"type": "ephemeral"}  # Cache this system prompt
    }],
    messages=[{"role": "user", "content": text_to_parse}]
)

With prompt caching, the schema tokens are only charged once per 5-minute window. For batch processing 100+ documents with the same schema, this cuts input token costs by ~80%.

Frequently Asked Questions

Does Claude have a native JSON mode like OpenAI?

As of 2026, Anthropic does not offer a dedicated JSON mode endpoint. The techniques in this guide (system prompt + prefilling + tool use) achieve equivalent or better results. Tool use with tool_choice gives you strict schema enforcement that OpenAI's JSON mode doesn't provide.

Which technique is fastest in production?

System prompt + prefilling has the same latency as a normal API call. Tool use adds ~10-15% latency due to the schema processing overhead. For high-throughput batch extraction, system prompt + prefilling is the better choice if schema strictness isn't critical.

What model should I use for JSON extraction?

claude-haiku-3-5 for high-volume batch extraction where the schema is simple. claude-sonnet-4-5 for complex nested schemas or when accuracy matters. claude-opus-4 is rarely needed for JSON extraction — Sonnet is already excellent at this.

How do I handle very large JSON outputs?

Increase max_tokens proportionally. A JSON object with 50 fields typically needs 500-1000 tokens. For streaming large JSON, use the streaming API but buffer until stop_reason: "end_turn" before parsing — partial JSON is not valid JSON.

P2 Claude Agent SDK Cookbook — 15 production agent recipes including JSON extraction pipelines with retry logic and cost guardrails. Get it →

Technique 1: System Prompt Instruction (Simplest)

Technique 2: Response Prefilling (Most Reliable)

Technique 3: Tool Use Enforcement (Best for Schemas)

Choosing the Right Technique

Production-Grade JSON Extraction

Handling Malformed Responses

Prompt Caching for Repeated Schemas

Frequently Asked Questions

Does Claude have a native JSON mode like OpenAI?

Which technique is fastest in production?

What model should I use for JSON extraction?

How do I handle very large JSON outputs?

Related guides

Claude Structured Output: Getting Reliable JSON Every Time

Claude API 구조화 출력(Structured Output) 완전 가이드 2026

Claude API JSON 구조화 출력 완전 가이드: Tool Use vs 프롬프트 방식

Claude API Structured Output: JSON Mode and Tool Use Patterns

Tools and references