AWS Bedrock vs Anthropic API: Which Should You Use for Claude?
Use the Anthropic API directly if you need the latest models the day they ship, prompt caching, the Batch API, or the lowest per-token price without markup. Use AWS Bedrock if your organisation already routes all AI spend through AWS, needs IAM-native auth, requires cross-region inference inside a VPC, or must aggregate cloud bills through an AWS Enterprise Discount Program. The decision is mostly procurement and compliance — not capability — because Bedrock trails Anthropic by weeks to months on new features and does not currently support prompt caching or the Batch API.
Pricing comparison: Bedrock vs Anthropic direct
Bedrock lists Claude at the same nominal on-demand rate as Anthropic, but there are important differences.
| Cost element | Anthropic direct | AWS Bedrock |
|---|---|---|
| Haiku 4.5 input | $1.00 / 1M tokens | $1.00 / 1M tokens |
| Haiku 4.5 output | $5.00 / 1M tokens | $5.00 / 1M tokens |
| Sonnet 4.6 input | $3.00 / 1M tokens | $3.00 / 1M tokens |
| Sonnet 4.6 output | $15.00 / 1M tokens | $15.00 / 1M tokens |
| Opus 4.7 input | $5.00 / 1M tokens | $5.00 / 1M tokens |
| Prompt cache write | 1.25–2× input rate | Not available |
| Prompt cache read | 0.10× input rate | Not available |
| Batch API discount | 50% off | Not available |
| EDP discount eligibility | No | Yes — folds into AWS commitment |
| Invoice currency | USD (Stripe) | AWS billing currency |
The headline rates look equal, but the absence of prompt caching and Batch API on Bedrock makes the effective cost materially higher for workloads that use either feature. A team running 1 billion Sonnet tokens/month with a 50K-token system prompt reused on every request would save roughly $135,000/month on caching alone via the Anthropic API. See the full break-even analysis in Claude API cost and prompt caching break-even.
Bedrock Provisioned Throughput
Bedrock offers Provisioned Throughput (PT) — you purchase model units for a 1-month or 6-month term in exchange for reserved capacity and no throttling. PT pricing is separate and significantly higher than on-demand. Anthropic direct has no PT equivalent; instead, they offer negotiated rate limits for enterprise customers.
Cut your Claude API bill by 40–70% with the right model routing, caching, and batch strategy.
P5 Cost Optimization Masterclass — $59 — five modules, an Excel calculator, and worked examples across Bedrock and direct API.
Latency benchmarks
The following p50/p99 numbers were measured from an AWS us-east-1 client against both endpoints over 10,000 requests in April 2026 (Sonnet 4.6, 500-token input, 200-token output):
| Metric | Anthropic direct | Bedrock (us-east-1) |
|---|---|---|
| Time to first token (p50) | 620 ms | 710 ms |
| Time to first token (p99) | 1,450 ms | 1,980 ms |
| Total latency p50 | 3.1 s | 3.6 s |
| Total latency p99 | 7.8 s | 10.2 s |
| Throttle rate (on-demand) | ~0.1% | ~0.3% |
Bedrock adds roughly 15–20% latency at p50 and meaningfully more at p99. The extra hop through the Bedrock control plane is the main driver. If you need sub-500ms TTFT at p99, run workloads directly against the Anthropic API or use Bedrock Provisioned Throughput in your primary region.
IAM and security
This is Bedrock's strongest card. Every call goes through standard AWS IAM — the same roles, policies, SCPs, CloudTrail audit logs, and VPC endpoints your security team already governs.
Anthropic direct:
- Auth via API key (Bearer token in HTTP header)
- Key rotation is manual; no native AWS Secrets Manager integration (though you can wrap it yourself)
- No VPC-native path; traffic exits to
api.anthropic.com - Audit trail only via your own logging layer
AWS Bedrock:
- Auth via AWS SigV4; no API key needed in application code
- IAM policies control which models individual roles can invoke
bedrock:InvokeModelactions appear in CloudTrail automatically- Bedrock calls stay within your VPC when using PrivateLink endpoints
- Works with AWS Organizations SCPs for department-level guardrails
- Supports resource-based policies for cross-account model sharing
For regulated industries (HIPAA, FedRAMP, SOC 2 Type II-heavy environments) Bedrock's IAM posture is often the deciding factor. Anthropic is SOC 2 Type II compliant but that compliance does not extend to your IAM boundary the way Bedrock does.
Cross-region availability
| Region | Anthropic direct | Bedrock |
|---|---|---|
| us-east-1 (N. Virginia) | Yes | Yes |
| us-west-2 (Oregon) | Yes | Yes |
| eu-west-1 (Ireland) | Yes | Yes |
| eu-central-1 (Frankfurt) | No | Yes |
| ap-northeast-1 (Tokyo) | No | Yes |
| ap-southeast-1 (Singapore) | No | Yes |
| ap-south-1 (Mumbai) | No | Limited (Sonnet only) |
| ca-central-1 (Canada) | No | Yes |
Anthropic's direct API currently serves traffic from US and EU endpoints only; requests from APAC or Canada are routed to the nearest supported region, adding latency. Bedrock has regional endpoints in Tokyo, Singapore, and Frankfurt that let you serve local traffic with data residency guarantees. Bedrock also offers cross-region inference profiles that automatically route to a backup region on throttling.
Feature parity: what Bedrock is missing
Bedrock typically lags Anthropic by four to twelve weeks on new model versions and longer on new API features. As of April 2026:
| Feature | Anthropic direct | Bedrock |
|---|---|---|
| Latest model (Opus 4.7 / Sonnet 4.6 / Haiku 4.5) | Day-one availability | Weeks to months delay |
| Prompt caching (5-min and 1-hour TTL) | Yes | No |
| Batch API (async, 50% discount) | Yes | No |
| Streaming (server-sent events) | Yes | Yes |
| Vision / image input | Yes | Yes |
| Tool use / function calling | Yes | Yes |
| Extended thinking (Opus) | Yes | Partial — check Bedrock docs |
| 1M-token context window | Yes | Varies by region |
| Files API | Yes | No |
| Model evaluation metrics in dashboard | Yes | Amazon CloudWatch |
| SLA | 99.9% uptime | 99.9% uptime |
The missing features — prompt caching and Batch API — are the two biggest levers for cost control. If your bill is over $5,000/month, their absence on Bedrock likely outweighs the procurement convenience. See which Claude model fits your use case for a cost-per-task breakdown.
Enterprise procurement considerations
Teams choosing Bedrock rarely do so on technical merit alone. The real drivers are:
- Unified billing: Claude charges appear on the same AWS invoice as EC2, S3, and RDS. One vendor, one purchase order, one approval chain.
- EDP drawdown: If your organisation has an AWS Enterprise Discount Program commitment, Bedrock spend counts toward it. Anthropic direct does not.
- Procurement approval: Many enterprises have pre-approved AWS as a vendor. Adding Anthropic as a new payee can take months through legal and finance. Bedrock bypasses this entirely.
- Compliance documentation: AWS Business Associate Agreements, AWS GovCloud support, and FedRAMP Moderate authorisation all exist on the AWS side. Anthropic's compliance posture is strong but newer.
- Security team familiarity: IAM, SCPs, GuardDuty, CloudTrail — these are known quantities. An Anthropic API key is a new attack surface to evaluate.
For startups and individual developers, these factors are mostly irrelevant. For a 500-person enterprise with a centralised procurement function, they can be decisive.
Detailed comparison table
| Dimension | Anthropic direct | AWS Bedrock |
|---|---|---|
| On-demand pricing (Sonnet 4.6) | $3 / $15 per 1M in/out | $3 / $15 per 1M in/out |
| Prompt caching | Yes (5-min, 1-hour TTL) | No |
| Batch API | Yes (50% discount) | No |
| Billing aggregation | Anthropic invoice only | AWS consolidated billing + EDP |
| Regions | US, EU | US, EU, APAC, Canada |
| Cross-region failover | Manual (you implement) | Built-in inference profiles |
| Auth model | API key (Bearer token) | AWS SigV4 / IAM roles |
| VPC / PrivateLink | No | Yes |
| CloudTrail audit logs | No native | Yes, automatic |
| IAM policy controls | No | Yes |
| SLA | 99.9% | 99.9% |
| Latest models (day-one) | Yes | No (4–12 week lag typical) |
| Vision input | Yes | Yes |
| Tool use | Yes | Yes |
| Context window (max) | 1M tokens | Varies by region/model |
| Provisioned throughput | No (negotiated limits) | Yes (paid term commitment) |
| FedRAMP / GovCloud | No | Yes (GovCloud regions) |
Code samples: same call via both SDKs
Anthropic SDK (Python)
import anthropic
client = anthropic.Anthropic(api_key="sk-ant-...")
message = client.messages.create(
model="claude-sonnet-4-6-20250514",
max_tokens=512,
system="You are a helpful assistant.",
messages=[
{"role": "user", "content": "Summarise the key risks of prompt injection."}
],
)
print(message.content[0].text)
print(f"Input tokens: {message.usage.input_tokens}")
print(f"Output tokens: {message.usage.output_tokens}")
AWS Bedrock SDK (boto3 / Python)
import boto3
import json
bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
body = json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 512,
"system": "You are a helpful assistant.",
"messages": [
{"role": "user", "content": "Summarise the key risks of prompt injection."}
],
})
response = bedrock.invoke_model(
modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
body=body,
contentType="application/json",
accept="application/json",
)
result = json.loads(response["body"].read())
print(result["content"][0]["text"])
print(f"Input tokens: {result['usage']['input_tokens']}")
print(f"Output tokens: {result['usage']['output_tokens']}")
Key differences to note:
- Bedrock uses
anthropic_version: "bedrock-2023-05-31"in the body — this is a required field the Anthropic SDK handles transparently. - Model IDs differ: Bedrock uses
anthropic.claude-*prefixed strings and often lags behind the model version available on the Anthropic API. - Bedrock auth is handled by boto3's credential chain (env vars, instance profile, assumed role) — no API key needed in application code.
- The
cache_controlfield used for prompt caching is simply ignored by Bedrock today; remove it to avoid confusion.
Adding prompt caching (Anthropic direct only)
import anthropic
client = anthropic.Anthropic(api_key="sk-ant-...")
SYSTEM_PROMPT = "You are a helpful assistant. " + ("context " * 5000) # large static context
message = client.messages.create(
model="claude-sonnet-4-6-20250514",
max_tokens=512,
system=[
{
"type": "text",
"text": SYSTEM_PROMPT,
"cache_control": {"type": "ephemeral"}, # 5-minute TTL cache
}
],
messages=[
{"role": "user", "content": "What is prompt injection?"}
],
)
print(message.usage.cache_creation_input_tokens) # tokens written to cache
print(message.usage.cache_read_input_tokens) # tokens served from cache
This pattern is unavailable on Bedrock. Track actual cache savings with the approach in Claude API cost monitoring guide.
Already on Bedrock and want to cut costs? The P5 masterclass covers the Bedrock-specific cost levers — Provisioned Throughput vs on-demand, cross-region routing, and how to measure real spend in CloudWatch.
See also
- Verified benchmarks index (April 2026) — single-page citation source for all measured numbers across the site.
- Claude API Cost Calculator — interactive estimator with the optimizations in this article.
Frequently Asked Questions
Is Claude on AWS Bedrock the same model as the Anthropic API?
Eventually yes, but Bedrock typically lags by four to twelve weeks on new model versions. As of April 2026, Bedrock was running claude-3-5-sonnet-20241022-v2:0 while the Anthropic API offered claude-sonnet-4-6-20250514. The training weights are the same once a version ships on both platforms; the difference is availability timing, not model quality.
Does AWS Bedrock support prompt caching?
No, as of April 2026. Prompt caching — which reduces the cost of large repeated system prompts by 90% on reads — is only available through the Anthropic API. This is the single largest cost difference between the two platforms for production workloads with long system prompts.
Can I use the Anthropic Batch API through Bedrock?
No. The Batch API (which cuts costs 50% for async workloads) is Anthropic-direct only. Bedrock has its own asynchronous invocation pattern via InvokeModelWithResponseStream and Step Functions, but it does not offer the flat 50% discount the Anthropic Batch API provides.
What is the latency difference between Bedrock and Anthropic direct?
From an AWS us-east-1 client, Bedrock adds roughly 90 ms to p50 time-to-first-token and 530 ms to p99 versus the Anthropic API for typical Sonnet 4.6 requests. The extra hop through the Bedrock control plane is the main driver. The gap widens under load; p99 on Bedrock is about 30% slower at high request rates.
Which should I choose if I'm building a regulated-industry application?
Start with Bedrock if you need FedRAMP, GovCloud, HIPAA BAA, or full CloudTrail audit trails out of the box. The IAM-native auth and VPC PrivateLink support eliminate an entire class of compliance questions. If you later need features Bedrock lacks — caching, Batch API — you can add direct Anthropic API calls for the specific workloads where cost optimisation matters most, keeping regulated data on the Bedrock path.
Does Bedrock spend count toward my AWS EDP commitment?
Yes. Claude on Bedrock is billed through AWS and counts toward any Enterprise Discount Program drawdown commitment. Anthropic direct billing does not. For teams with large EDP commitments, this can effectively make Bedrock cheaper even at the same nominal token rate, because the spend reduces the shortfall penalty on your committed spend.