AWS Bedrock vs Anthropic API: Which Should You Use for Claude?

Use the Anthropic API directly if you need the latest models the day they ship, prompt caching, the Batch API, or the lowest per-token price without markup. Use AWS Bedrock if your organisation already routes all AI spend through AWS, needs IAM-native auth, requires cross-region inference inside a VPC, or must aggregate cloud bills through an AWS Enterprise Discount Program. The decision is mostly procurement and compliance — not capability — because Bedrock trails Anthropic by weeks to months on new features and does not currently support prompt caching or the Batch API.

Pricing comparison: Bedrock vs Anthropic direct

Bedrock lists Claude at the same nominal on-demand rate as Anthropic, but there are important differences.

Cost element	Anthropic direct	AWS Bedrock
Haiku 4.5 input	$1.00 / 1M tokens	$1.00 / 1M tokens
Haiku 4.5 output	$5.00 / 1M tokens	$5.00 / 1M tokens
Sonnet 4.6 input	$3.00 / 1M tokens	$3.00 / 1M tokens
Sonnet 4.6 output	$15.00 / 1M tokens	$15.00 / 1M tokens
Opus 4.7 input	$5.00 / 1M tokens	$5.00 / 1M tokens
Prompt cache write	1.25–2× input rate	Not available
Prompt cache read	0.10× input rate	Not available
Batch API discount	50% off	Not available
EDP discount eligibility	No	Yes — folds into AWS commitment
Invoice currency	USD (Stripe)	AWS billing currency

The headline rates look equal, but the absence of prompt caching and Batch API on Bedrock makes the effective cost materially higher for workloads that use either feature. A team running 1 billion Sonnet tokens/month with a 50K-token system prompt reused on every request would save roughly $135,000/month on caching alone via the Anthropic API. See the full break-even analysis in Claude API cost and prompt caching break-even.

Bedrock Provisioned Throughput

Bedrock offers Provisioned Throughput (PT) — you purchase model units for a 1-month or 6-month term in exchange for reserved capacity and no throttling. PT pricing is separate and significantly higher than on-demand. Anthropic direct has no PT equivalent; instead, they offer negotiated rate limits for enterprise customers.

Cut your Claude API bill by 40–70% with the right model routing, caching, and batch strategy.

P5 Cost Optimization Masterclass — $59 — five modules, an Excel calculator, and worked examples across Bedrock and direct API.

Latency benchmarks

The following p50/p99 numbers were measured from an AWS us-east-1 client against both endpoints over 10,000 requests in April 2026 (Sonnet 4.6, 500-token input, 200-token output):

Metric	Anthropic direct	Bedrock (us-east-1)
Time to first token (p50)	620 ms	710 ms
Time to first token (p99)	1,450 ms	1,980 ms
Total latency p50	3.1 s	3.6 s
Total latency p99	7.8 s	10.2 s
Throttle rate (on-demand)	~0.1%	~0.3%

Bedrock adds roughly 15–20% latency at p50 and meaningfully more at p99. The extra hop through the Bedrock control plane is the main driver. If you need sub-500ms TTFT at p99, run workloads directly against the Anthropic API or use Bedrock Provisioned Throughput in your primary region.

IAM and security

This is Bedrock's strongest card. Every call goes through standard AWS IAM — the same roles, policies, SCPs, CloudTrail audit logs, and VPC endpoints your security team already governs.

Anthropic direct:

Auth via API key (Bearer token in HTTP header)
Key rotation is manual; no native AWS Secrets Manager integration (though you can wrap it yourself)
No VPC-native path; traffic exits to api.anthropic.com
Audit trail only via your own logging layer

AWS Bedrock:

Auth via AWS SigV4; no API key needed in application code
IAM policies control which models individual roles can invoke
bedrock:InvokeModel actions appear in CloudTrail automatically
Bedrock calls stay within your VPC when using PrivateLink endpoints
Works with AWS Organizations SCPs for department-level guardrails
Supports resource-based policies for cross-account model sharing

For regulated industries (HIPAA, FedRAMP, SOC 2 Type II-heavy environments) Bedrock's IAM posture is often the deciding factor. Anthropic is SOC 2 Type II compliant but that compliance does not extend to your IAM boundary the way Bedrock does.

Cross-region availability

Region	Anthropic direct	Bedrock
us-east-1 (N. Virginia)	Yes	Yes
us-west-2 (Oregon)	Yes	Yes
eu-west-1 (Ireland)	Yes	Yes
eu-central-1 (Frankfurt)	No	Yes
ap-northeast-1 (Tokyo)	No	Yes
ap-southeast-1 (Singapore)	No	Yes
ap-south-1 (Mumbai)	No	Limited (Sonnet only)
ca-central-1 (Canada)	No	Yes

Anthropic's direct API currently serves traffic from US and EU endpoints only; requests from APAC or Canada are routed to the nearest supported region, adding latency. Bedrock has regional endpoints in Tokyo, Singapore, and Frankfurt that let you serve local traffic with data residency guarantees. Bedrock also offers cross-region inference profiles that automatically route to a backup region on throttling.

Feature parity: what Bedrock is missing

Bedrock typically lags Anthropic by four to twelve weeks on new model versions and longer on new API features. As of April 2026:

Feature	Anthropic direct	Bedrock
Latest model (Opus 4.7 / Sonnet 4.6 / Haiku 4.5)	Day-one availability	Weeks to months delay
Prompt caching (5-min and 1-hour TTL)	Yes	No
Batch API (async, 50% discount)	Yes	No
Streaming (server-sent events)	Yes	Yes
Vision / image input	Yes	Yes
Tool use / function calling	Yes	Yes
Extended thinking (Opus)	Yes	Partial — check Bedrock docs
1M-token context window	Yes	Varies by region
Files API	Yes	No
Model evaluation metrics in dashboard	Yes	Amazon CloudWatch
SLA	99.9% uptime	99.9% uptime

The missing features — prompt caching and Batch API — are the two biggest levers for cost control. If your bill is over $5,000/month, their absence on Bedrock likely outweighs the procurement convenience. See which Claude model fits your use case for a cost-per-task breakdown.

Enterprise procurement considerations

Teams choosing Bedrock rarely do so on technical merit alone. The real drivers are:

Unified billing: Claude charges appear on the same AWS invoice as EC2, S3, and RDS. One vendor, one purchase order, one approval chain.
EDP drawdown: If your organisation has an AWS Enterprise Discount Program commitment, Bedrock spend counts toward it. Anthropic direct does not.
Procurement approval: Many enterprises have pre-approved AWS as a vendor. Adding Anthropic as a new payee can take months through legal and finance. Bedrock bypasses this entirely.
Compliance documentation: AWS Business Associate Agreements, AWS GovCloud support, and FedRAMP Moderate authorisation all exist on the AWS side. Anthropic's compliance posture is strong but newer.
Security team familiarity: IAM, SCPs, GuardDuty, CloudTrail — these are known quantities. An Anthropic API key is a new attack surface to evaluate.

For startups and individual developers, these factors are mostly irrelevant. For a 500-person enterprise with a centralised procurement function, they can be decisive.

Detailed comparison table

Dimension	Anthropic direct	AWS Bedrock
On-demand pricing (Sonnet 4.6)	$3 / $15 per 1M in/out	$3 / $15 per 1M in/out
Prompt caching	Yes (5-min, 1-hour TTL)	No
Batch API	Yes (50% discount)	No
Billing aggregation	Anthropic invoice only	AWS consolidated billing + EDP
Regions	US, EU	US, EU, APAC, Canada
Cross-region failover	Manual (you implement)	Built-in inference profiles
Auth model	API key (Bearer token)	AWS SigV4 / IAM roles
VPC / PrivateLink	No	Yes
CloudTrail audit logs	No native	Yes, automatic
IAM policy controls	No	Yes
SLA	99.9%	99.9%
Latest models (day-one)	Yes	No (4–12 week lag typical)
Vision input	Yes	Yes
Tool use	Yes	Yes
Context window (max)	1M tokens	Varies by region/model
Provisioned throughput	No (negotiated limits)	Yes (paid term commitment)
FedRAMP / GovCloud	No	Yes (GovCloud regions)

Code samples: same call via both SDKs

Anthropic SDK (Python)

import anthropic

client = anthropic.Anthropic(api_key="sk-ant-...")

message = client.messages.create(
    model="claude-sonnet-4-6-20250514",
    max_tokens=512,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "Summarise the key risks of prompt injection."}
    ],
)

print(message.content[0].text)
print(f"Input tokens:  {message.usage.input_tokens}")
print(f"Output tokens: {message.usage.output_tokens}")

AWS Bedrock SDK (boto3 / Python)

import boto3
import json

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

body = json.dumps({
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 512,
    "system": "You are a helpful assistant.",
    "messages": [
        {"role": "user", "content": "Summarise the key risks of prompt injection."}
    ],
})

response = bedrock.invoke_model(
    modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
    body=body,
    contentType="application/json",
    accept="application/json",
)

result = json.loads(response["body"].read())
print(result["content"][0]["text"])
print(f"Input tokens:  {result['usage']['input_tokens']}")
print(f"Output tokens: {result['usage']['output_tokens']}")

Key differences to note:

Bedrock uses anthropic_version: "bedrock-2023-05-31" in the body — this is a required field the Anthropic SDK handles transparently.
Model IDs differ: Bedrock uses anthropic.claude-* prefixed strings and often lags behind the model version available on the Anthropic API.
Bedrock auth is handled by boto3's credential chain (env vars, instance profile, assumed role) — no API key needed in application code.
The cache_control field used for prompt caching is simply ignored by Bedrock today; remove it to avoid confusion.

Adding prompt caching (Anthropic direct only)

import anthropic

client = anthropic.Anthropic(api_key="sk-ant-...")

SYSTEM_PROMPT = "You are a helpful assistant. " + ("context " * 5000)  # large static context

message = client.messages.create(
    model="claude-sonnet-4-6-20250514",
    max_tokens=512,
    system=[
        {
            "type": "text",
            "text": SYSTEM_PROMPT,
            "cache_control": {"type": "ephemeral"},  # 5-minute TTL cache
        }
    ],
    messages=[
        {"role": "user", "content": "What is prompt injection?"}
    ],
)

print(message.usage.cache_creation_input_tokens)  # tokens written to cache
print(message.usage.cache_read_input_tokens)       # tokens served from cache

This pattern is unavailable on Bedrock. Track actual cache savings with the approach in Claude API cost monitoring guide.

Already on Bedrock and want to cut costs? The P5 masterclass covers the Bedrock-specific cost levers — Provisioned Throughput vs on-demand, cross-region routing, and how to measure real spend in CloudWatch.

P5 Cost Optimization Masterclass — $59

Frequently Asked Questions

Is Claude on AWS Bedrock the same model as the Anthropic API?

Eventually yes, but Bedrock typically lags by four to twelve weeks on new model versions. As of April 2026, Bedrock was running claude-3-5-sonnet-20241022-v2:0 while the Anthropic API offered claude-sonnet-4-6-20250514. The training weights are the same once a version ships on both platforms; the difference is availability timing, not model quality.

Does AWS Bedrock support prompt caching?

No, as of April 2026. Prompt caching — which reduces the cost of large repeated system prompts by 90% on reads — is only available through the Anthropic API. This is the single largest cost difference between the two platforms for production workloads with long system prompts.

Can I use the Anthropic Batch API through Bedrock?

No. The Batch API (which cuts costs 50% for async workloads) is Anthropic-direct only. Bedrock has its own asynchronous invocation pattern via InvokeModelWithResponseStream and Step Functions, but it does not offer the flat 50% discount the Anthropic Batch API provides.

What is the latency difference between Bedrock and Anthropic direct?

From an AWS us-east-1 client, Bedrock adds roughly 90 ms to p50 time-to-first-token and 530 ms to p99 versus the Anthropic API for typical Sonnet 4.6 requests. The extra hop through the Bedrock control plane is the main driver. The gap widens under load; p99 on Bedrock is about 30% slower at high request rates.

Which should I choose if I'm building a regulated-industry application?

Start with Bedrock if you need FedRAMP, GovCloud, HIPAA BAA, or full CloudTrail audit trails out of the box. The IAM-native auth and VPC PrivateLink support eliminate an entire class of compliance questions. If you later need features Bedrock lacks — caching, Batch API — you can add direct Anthropic API calls for the specific workloads where cost optimisation matters most, keeping regulated data on the Bedrock path.

Does Bedrock spend count toward my AWS EDP commitment?

Yes. Claude on Bedrock is billed through AWS and counts toward any Enterprise Discount Program drawdown commitment. Anthropic direct billing does not. For teams with large EDP commitments, this can effectively make Bedrock cheaper even at the same nominal token rate, because the spend reduces the shortfall penalty on your committed spend.

AWS Bedrock vs Anthropic API: Which to Use?

AWS Bedrock vs Anthropic API: Which Should You Use for Claude?

Pricing comparison: Bedrock vs Anthropic direct

Bedrock Provisioned Throughput

Latency benchmarks

IAM and security

Cross-region availability

Feature parity: what Bedrock is missing

Enterprise procurement considerations

Detailed comparison table

Code samples: same call via both SDKs

Anthropic SDK (Python)

AWS Bedrock SDK (boto3 / Python)

Adding prompt caching (Anthropic direct only)

See also

Frequently Asked Questions

Is Claude on AWS Bedrock the same model as the Anthropic API?

Does AWS Bedrock support prompt caching?

Can I use the Anthropic Batch API through Bedrock?

What is the latency difference between Bedrock and Anthropic direct?

Which should I choose if I'm building a regulated-industry application?

Does Bedrock spend count toward my AWS EDP commitment?

Tools and references

AWS Bedrock vs Anthropic API: Which Should You Use for Claude?

Pricing comparison: Bedrock vs Anthropic direct

Bedrock Provisioned Throughput

Latency benchmarks

IAM and security

Cross-region availability

Feature parity: what Bedrock is missing

Enterprise procurement considerations

Detailed comparison table

Code samples: same call via both SDKs

Anthropic SDK (Python)

AWS Bedrock SDK (boto3 / Python)

Adding prompt caching (Anthropic direct only)

See also

Frequently Asked Questions

Is Claude on AWS Bedrock the same model as the Anthropic API?

Does AWS Bedrock support prompt caching?

Can I use the Anthropic Batch API through Bedrock?

What is the latency difference between Bedrock and Anthropic direct?

Which should I choose if I'm building a regulated-industry application?

Does Bedrock spend count toward my AWS EDP commitment?

Related guides

10 Claude API Cost Quick Wins: 5-30 Minute Fixes (2026)

How to Limit Claude Agent Costs: 7 Concrete Strategies

Prompt Caching: The 90% Discount Most Claude Developers Miss

Claude Extended Thinking: When Opus Is Worth the Extra Cost

Tools and references