Claude doesn't have a dedicated "JSON mode" endpoint like OpenAI, but you can achieve near-100% reliable JSON output using three techniques: system prompt instruction, response prefilling, and tool use enforcement. For claude-sonnet-4-5, the system prompt approach alone achieves 97%+ JSON compliance — add prefilling and you're at 99.5%.
Technique 1: System Prompt Instruction (Simplest)
The fastest path to JSON output:
import anthropic
import json
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
system="""You are a data extraction API. Always respond with valid JSON only.
No explanatory text, no markdown code fences, no prefixes.
Your entire response must be parseable by json.loads().""",
messages=[{
"role": "user",
"content": "Extract: name, email, company from: 'Hi, I'm Sarah Chen, sarah@acme.io, Acme Corp'"
}]
)
data = json.loads(response.content[0].text)
# {"name": "Sarah Chen", "email": "sarah@acme.io", "company": "Acme Corp"}
Works 97%+ of the time with claude-sonnet-4-5. The remaining 3% happens when Claude adds a brief explanation before the JSON. Fix that with prefilling.
Technique 2: Response Prefilling (Most Reliable)
Force JSON by starting the assistant's response with {:
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
system="Extract structured data as JSON. Include all fields you find.",
messages=[
{"role": "user", "content": "Parse: Alice Johnson, Head of Engineering, alice@startup.io"},
{"role": "assistant", "content": "{"} # Prefill — Claude continues from here
]
)
# Response will be the JSON body (without the opening {)
raw = "{" + response.content[0].text
data = json.loads(raw)
When you prefill with {, Claude must continue with valid JSON. It cannot prepend explanation text because the response already started. This brings compliance to 99.5%+.
Technique 3: Tool Use Enforcement (Best for Schemas)
Use tool use to enforce a specific JSON schema. Claude cannot return non-JSON when a tool call is required:
tools = [{
"name": "extract_contact",
"description": "Extract contact information from text",
"input_schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {"type": "string", "format": "email"},
"company": {"type": "string"},
"role": {"type": "string"}
},
"required": ["name"]
}
}]
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
tools=tools,
tool_choice={"type": "tool", "name": "extract_contact"}, # Force this tool
messages=[{"role": "user", "content": "Alice Johnson, Head of Engineering at Startup Inc"}]
)
# Get structured data from tool call
tool_block = next(b for b in response.content if b.type == "tool_use")
data = tool_block.input # Already a dict, no json.loads() needed
# {"name": "Alice Johnson", "role": "Head of Engineering", "company": "Startup Inc"}
Tool use with tool_choice: {type: "tool", name: "..."} achieves 100% schema compliance — Claude literally cannot return anything except valid tool arguments.
Choosing the Right Technique
| Technique | Compliance | Schema Control | Complexity | Cost |
|---|---|---|---|---|
| System prompt | 97% | Loose | Low | Base |
| + Prefilling | 99.5% | Loose | Low | Base |
| Tool use | 100% | Strict | Medium | +~15% tokens |
Use system prompt + prefill when you need quick JSON without caring about exact schema enforcement.
Use tool use when you need strict schema validation, required fields, and type checking.
Production-Grade JSON Extraction
Combining all techniques with error handling:
import anthropic
import json
import re
from typing import TypeVar, Type
from pydantic import BaseModel
client = anthropic.Anthropic()
T = TypeVar('T', bound=BaseModel)
def extract_json(
prompt: str,
schema: Type[T],
*,
model: str = "claude-sonnet-4-5"
) -> T:
"""
Extract structured data using tool use enforcement.
Returns validated Pydantic model.
"""
tool_def = {
"name": "structured_output",
"description": "Return the extracted data in structured format",
"input_schema": schema.model_json_schema()
}
response = client.messages.create(
model=model,
max_tokens=2048,
tools=[tool_def],
tool_choice={"type": "tool", "name": "structured_output"},
messages=[{"role": "user", "content": prompt}]
)
tool_block = next(
(b for b in response.content if b.type == "tool_use"),
None
)
if not tool_block:
raise ValueError(f"No tool call in response. Content: {response.content}")
return schema.model_validate(tool_block.input)
# Example usage
from pydantic import BaseModel, Field
from typing import List, Optional
class Invoice(BaseModel):
invoice_number: str
vendor: str
amount_usd: float
line_items: List[str]
due_date: Optional[str] = None
invoice_text = """
Invoice #INV-2024-1847
From: Acme Corp
Total: $4,250.00
Items: Cloud hosting ($2,000), Support ($1,500), Setup ($750)
Due: 2024-03-15
"""
invoice = extract_json(invoice_text, Invoice)
print(f"Invoice: {invoice.invoice_number}")
print(f"Amount: ${invoice.amount_usd:,.2f}")
print(f"Items: {invoice.line_items}")
Handling Malformed Responses
Even with prefilling, raw text sometimes slips through. A robust parser:
def parse_json_response(text: str) -> dict:
"""
Parse JSON from Claude response, handling common wrapping patterns.
Priority: direct parse → strip fences → extract first object/array
"""
text = text.strip()
# 1. Direct parse (ideal case)
try:
return json.loads(text)
except json.JSONDecodeError:
pass
# 2. Strip markdown code fences
fence_match = re.search(r'```(?:json)?\s*([\s\S]+?)\s*```', text)
if fence_match:
try:
return json.loads(fence_match.group(1))
except json.JSONDecodeError:
pass
# 3. Extract first JSON object or array
obj_match = re.search(r'(\{[\s\S]+\}|\[[\s\S]+\])', text)
if obj_match:
try:
return json.loads(obj_match.group(1))
except json.JSONDecodeError:
pass
raise ValueError(f"Cannot extract JSON from response: {text[:200]}")
Prompt Caching for Repeated Schemas
If you're running many JSON extractions with the same schema, cache the system prompt:
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
system=[{
"type": "text",
"text": f"""Extract data matching this JSON schema exactly:
{json.dumps(schema.model_json_schema(), indent=2)}
Return only the JSON object. No other text.""",
"cache_control": {"type": "ephemeral"} # Cache this system prompt
}],
messages=[{"role": "user", "content": text_to_parse}]
)
With prompt caching, the schema tokens are only charged once per 5-minute window. For batch processing 100+ documents with the same schema, this cuts input token costs by ~80%.
Frequently Asked Questions
Does Claude have a native JSON mode like OpenAI?
As of 2026, Anthropic does not offer a dedicated JSON mode endpoint. The techniques in this guide (system prompt + prefilling + tool use) achieve equivalent or better results. Tool use with tool_choice gives you strict schema enforcement that OpenAI's JSON mode doesn't provide.
Which technique is fastest in production?
System prompt + prefilling has the same latency as a normal API call. Tool use adds ~10-15% latency due to the schema processing overhead. For high-throughput batch extraction, system prompt + prefilling is the better choice if schema strictness isn't critical.
What model should I use for JSON extraction?
claude-haiku-3-5 for high-volume batch extraction where the schema is simple. claude-sonnet-4-5 for complex nested schemas or when accuracy matters. claude-opus-4 is rarely needed for JSON extraction — Sonnet is already excellent at this.
How do I handle very large JSON outputs?
Increase max_tokens proportionally. A JSON object with 50 fields typically needs 500-1000 tokens. For streaming large JSON, use the streaming API but buffer until stop_reason: "end_turn" before parsing — partial JSON is not valid JSON.
Related guides: Claude API Structured Output Guide · Claude API Tool Use Guide · Claude API Python Tutorial
P2 Claude Agent SDK Cookbook — 15 production agent recipes including JSON extraction pipelines with retry logic and cost guardrails. Get it →