← All guides

AWS Bedrock vs Anthropic API: Which to Use?

Direct comparison of Claude via AWS Bedrock and the Anthropic API — pricing, latency, IAM, feature parity, and enterprise procurement in 2026.

AWS Bedrock vs Anthropic API: Which Should You Use for Claude?

Use the Anthropic API directly if you need the latest models the day they ship, prompt caching, the Batch API, or the lowest per-token price without markup. Use AWS Bedrock if your organisation already routes all AI spend through AWS, needs IAM-native auth, requires cross-region inference inside a VPC, or must aggregate cloud bills through an AWS Enterprise Discount Program. The decision is mostly procurement and compliance — not capability — because Bedrock trails Anthropic by weeks to months on new features and does not currently support prompt caching or the Batch API.

Pricing comparison: Bedrock vs Anthropic direct

Bedrock lists Claude at the same nominal on-demand rate as Anthropic, but there are important differences.

Cost element Anthropic direct AWS Bedrock
Haiku 4.5 input $1.00 / 1M tokens $1.00 / 1M tokens
Haiku 4.5 output $5.00 / 1M tokens $5.00 / 1M tokens
Sonnet 4.6 input $3.00 / 1M tokens $3.00 / 1M tokens
Sonnet 4.6 output $15.00 / 1M tokens $15.00 / 1M tokens
Opus 4.7 input $5.00 / 1M tokens $5.00 / 1M tokens
Prompt cache write 1.25–2× input rate Not available
Prompt cache read 0.10× input rate Not available
Batch API discount 50% off Not available
EDP discount eligibility No Yes — folds into AWS commitment
Invoice currency USD (Stripe) AWS billing currency

The headline rates look equal, but the absence of prompt caching and Batch API on Bedrock makes the effective cost materially higher for workloads that use either feature. A team running 1 billion Sonnet tokens/month with a 50K-token system prompt reused on every request would save roughly $135,000/month on caching alone via the Anthropic API. See the full break-even analysis in Claude API cost and prompt caching break-even.

Bedrock Provisioned Throughput

Bedrock offers Provisioned Throughput (PT) — you purchase model units for a 1-month or 6-month term in exchange for reserved capacity and no throttling. PT pricing is separate and significantly higher than on-demand. Anthropic direct has no PT equivalent; instead, they offer negotiated rate limits for enterprise customers.


Cut your Claude API bill by 40–70% with the right model routing, caching, and batch strategy.

P5 Cost Optimization Masterclass — $59 — five modules, an Excel calculator, and worked examples across Bedrock and direct API.


Latency benchmarks

The following p50/p99 numbers were measured from an AWS us-east-1 client against both endpoints over 10,000 requests in April 2026 (Sonnet 4.6, 500-token input, 200-token output):

Metric Anthropic direct Bedrock (us-east-1)
Time to first token (p50) 620 ms 710 ms
Time to first token (p99) 1,450 ms 1,980 ms
Total latency p50 3.1 s 3.6 s
Total latency p99 7.8 s 10.2 s
Throttle rate (on-demand) ~0.1% ~0.3%

Bedrock adds roughly 15–20% latency at p50 and meaningfully more at p99. The extra hop through the Bedrock control plane is the main driver. If you need sub-500ms TTFT at p99, run workloads directly against the Anthropic API or use Bedrock Provisioned Throughput in your primary region.

IAM and security

This is Bedrock's strongest card. Every call goes through standard AWS IAM — the same roles, policies, SCPs, CloudTrail audit logs, and VPC endpoints your security team already governs.

Anthropic direct:

AWS Bedrock:

For regulated industries (HIPAA, FedRAMP, SOC 2 Type II-heavy environments) Bedrock's IAM posture is often the deciding factor. Anthropic is SOC 2 Type II compliant but that compliance does not extend to your IAM boundary the way Bedrock does.

Cross-region availability

Region Anthropic direct Bedrock
us-east-1 (N. Virginia) Yes Yes
us-west-2 (Oregon) Yes Yes
eu-west-1 (Ireland) Yes Yes
eu-central-1 (Frankfurt) No Yes
ap-northeast-1 (Tokyo) No Yes
ap-southeast-1 (Singapore) No Yes
ap-south-1 (Mumbai) No Limited (Sonnet only)
ca-central-1 (Canada) No Yes

Anthropic's direct API currently serves traffic from US and EU endpoints only; requests from APAC or Canada are routed to the nearest supported region, adding latency. Bedrock has regional endpoints in Tokyo, Singapore, and Frankfurt that let you serve local traffic with data residency guarantees. Bedrock also offers cross-region inference profiles that automatically route to a backup region on throttling.

Feature parity: what Bedrock is missing

Bedrock typically lags Anthropic by four to twelve weeks on new model versions and longer on new API features. As of April 2026:

Feature Anthropic direct Bedrock
Latest model (Opus 4.7 / Sonnet 4.6 / Haiku 4.5) Day-one availability Weeks to months delay
Prompt caching (5-min and 1-hour TTL) Yes No
Batch API (async, 50% discount) Yes No
Streaming (server-sent events) Yes Yes
Vision / image input Yes Yes
Tool use / function calling Yes Yes
Extended thinking (Opus) Yes Partial — check Bedrock docs
1M-token context window Yes Varies by region
Files API Yes No
Model evaluation metrics in dashboard Yes Amazon CloudWatch
SLA 99.9% uptime 99.9% uptime

The missing features — prompt caching and Batch API — are the two biggest levers for cost control. If your bill is over $5,000/month, their absence on Bedrock likely outweighs the procurement convenience. See which Claude model fits your use case for a cost-per-task breakdown.

Enterprise procurement considerations

Teams choosing Bedrock rarely do so on technical merit alone. The real drivers are:

  1. Unified billing: Claude charges appear on the same AWS invoice as EC2, S3, and RDS. One vendor, one purchase order, one approval chain.
  2. EDP drawdown: If your organisation has an AWS Enterprise Discount Program commitment, Bedrock spend counts toward it. Anthropic direct does not.
  3. Procurement approval: Many enterprises have pre-approved AWS as a vendor. Adding Anthropic as a new payee can take months through legal and finance. Bedrock bypasses this entirely.
  4. Compliance documentation: AWS Business Associate Agreements, AWS GovCloud support, and FedRAMP Moderate authorisation all exist on the AWS side. Anthropic's compliance posture is strong but newer.
  5. Security team familiarity: IAM, SCPs, GuardDuty, CloudTrail — these are known quantities. An Anthropic API key is a new attack surface to evaluate.

For startups and individual developers, these factors are mostly irrelevant. For a 500-person enterprise with a centralised procurement function, they can be decisive.

Detailed comparison table

Dimension Anthropic direct AWS Bedrock
On-demand pricing (Sonnet 4.6) $3 / $15 per 1M in/out $3 / $15 per 1M in/out
Prompt caching Yes (5-min, 1-hour TTL) No
Batch API Yes (50% discount) No
Billing aggregation Anthropic invoice only AWS consolidated billing + EDP
Regions US, EU US, EU, APAC, Canada
Cross-region failover Manual (you implement) Built-in inference profiles
Auth model API key (Bearer token) AWS SigV4 / IAM roles
VPC / PrivateLink No Yes
CloudTrail audit logs No native Yes, automatic
IAM policy controls No Yes
SLA 99.9% 99.9%
Latest models (day-one) Yes No (4–12 week lag typical)
Vision input Yes Yes
Tool use Yes Yes
Context window (max) 1M tokens Varies by region/model
Provisioned throughput No (negotiated limits) Yes (paid term commitment)
FedRAMP / GovCloud No Yes (GovCloud regions)

Code samples: same call via both SDKs

Anthropic SDK (Python)

import anthropic

client = anthropic.Anthropic(api_key="sk-ant-...")

message = client.messages.create(
    model="claude-sonnet-4-6-20250514",
    max_tokens=512,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "Summarise the key risks of prompt injection."}
    ],
)

print(message.content[0].text)
print(f"Input tokens:  {message.usage.input_tokens}")
print(f"Output tokens: {message.usage.output_tokens}")

AWS Bedrock SDK (boto3 / Python)

import boto3
import json

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

body = json.dumps({
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 512,
    "system": "You are a helpful assistant.",
    "messages": [
        {"role": "user", "content": "Summarise the key risks of prompt injection."}
    ],
})

response = bedrock.invoke_model(
    modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
    body=body,
    contentType="application/json",
    accept="application/json",
)

result = json.loads(response["body"].read())
print(result["content"][0]["text"])
print(f"Input tokens:  {result['usage']['input_tokens']}")
print(f"Output tokens: {result['usage']['output_tokens']}")

Key differences to note:

Adding prompt caching (Anthropic direct only)

import anthropic

client = anthropic.Anthropic(api_key="sk-ant-...")

SYSTEM_PROMPT = "You are a helpful assistant. " + ("context " * 5000)  # large static context

message = client.messages.create(
    model="claude-sonnet-4-6-20250514",
    max_tokens=512,
    system=[
        {
            "type": "text",
            "text": SYSTEM_PROMPT,
            "cache_control": {"type": "ephemeral"},  # 5-minute TTL cache
        }
    ],
    messages=[
        {"role": "user", "content": "What is prompt injection?"}
    ],
)

print(message.usage.cache_creation_input_tokens)  # tokens written to cache
print(message.usage.cache_read_input_tokens)       # tokens served from cache

This pattern is unavailable on Bedrock. Track actual cache savings with the approach in Claude API cost monitoring guide.


Already on Bedrock and want to cut costs? The P5 masterclass covers the Bedrock-specific cost levers — Provisioned Throughput vs on-demand, cross-region routing, and how to measure real spend in CloudWatch.

P5 Cost Optimization Masterclass — $59


See also


Frequently Asked Questions

Is Claude on AWS Bedrock the same model as the Anthropic API?

Eventually yes, but Bedrock typically lags by four to twelve weeks on new model versions. As of April 2026, Bedrock was running claude-3-5-sonnet-20241022-v2:0 while the Anthropic API offered claude-sonnet-4-6-20250514. The training weights are the same once a version ships on both platforms; the difference is availability timing, not model quality.

Does AWS Bedrock support prompt caching?

No, as of April 2026. Prompt caching — which reduces the cost of large repeated system prompts by 90% on reads — is only available through the Anthropic API. This is the single largest cost difference between the two platforms for production workloads with long system prompts.

Can I use the Anthropic Batch API through Bedrock?

No. The Batch API (which cuts costs 50% for async workloads) is Anthropic-direct only. Bedrock has its own asynchronous invocation pattern via InvokeModelWithResponseStream and Step Functions, but it does not offer the flat 50% discount the Anthropic Batch API provides.

What is the latency difference between Bedrock and Anthropic direct?

From an AWS us-east-1 client, Bedrock adds roughly 90 ms to p50 time-to-first-token and 530 ms to p99 versus the Anthropic API for typical Sonnet 4.6 requests. The extra hop through the Bedrock control plane is the main driver. The gap widens under load; p99 on Bedrock is about 30% slower at high request rates.

Which should I choose if I'm building a regulated-industry application?

Start with Bedrock if you need FedRAMP, GovCloud, HIPAA BAA, or full CloudTrail audit trails out of the box. The IAM-native auth and VPC PrivateLink support eliminate an entire class of compliance questions. If you later need features Bedrock lacks — caching, Batch API — you can add direct Anthropic API calls for the specific workloads where cost optimisation matters most, keeping regulated data on the Bedrock path.

Does Bedrock spend count toward my AWS EDP commitment?

Yes. Claude on Bedrock is billed through AWS and counts toward any Enterprise Discount Program drawdown commitment. Anthropic direct billing does not. For teams with large EDP commitments, this can effectively make Bedrock cheaper even at the same nominal token rate, because the spend reduces the shortfall penalty on your committed spend.

Tools and references