Question 1

Is AWS Bedrock cheaper than using APIs directly?

Accepted Answer

No. On-demand pricing on Bedrock matches the model providers' direct pricing (e.g., Claude on Bedrock costs the same as Claude on Anthropic's API). Batch inference is 50% cheaper, which is the same discount Anthropic offers directly. The value of Bedrock is AWS integration, not lower prices.

Question 2

What models are available on AWS Bedrock?

Accepted Answer

Bedrock offers Claude (Anthropic), Llama (Meta), Mistral, Amazon Titan, Cohere Command, and Stability AI models. The selection is broad but model versions may lag behind the providers' direct APIs by a few weeks.

Question 3

What is Provisioned Throughput?

Accepted Answer

Provisioned Throughput reserves dedicated model capacity for your workload. You pay an hourly rate for guaranteed throughput and latency. It requires a 1-month or 6-month commitment. Worth it for high-volume production workloads where consistent performance matters.

Question 4

How much do Bedrock Knowledge Bases cost?

Accepted Answer

Knowledge Base queries cost $0.01 per retrieval query plus the LLM costs for generating answers. The backend uses OpenSearch Serverless, which has a minimum of roughly $700/month (4 OCUs). For small projects, this floor cost makes managed RAG expensive compared to alternatives.

Question 5

Should I use Bedrock or call model APIs directly?

Accepted Answer

Use Bedrock if you're already on AWS and want to keep data within your VPC, use IAM for access control, and avoid managing separate API keys. Call APIs directly if you want the latest model versions immediately, lower overhead for small projects, or you're not locked into AWS.

Question 6

Does Bedrock support fine-tuning?

Accepted Answer

Yes, Bedrock supports fine-tuning for select models including Amazon Titan and some Llama variants. Fine-tuned models require Provisioned Throughput to serve, which adds to the cost. The fine-tuning job itself is billed separately based on the number of tokens processed during training.

Question 7

AWS Bedrock Anthropic Claude pricing per 1M tokens in 2026?

Accepted Answer

AWS Bedrock Anthropic Claude pricing per million tokens in April 2026: Claude Opus 4.6 at $5/$25 input/output (matches direct Anthropic API pricing). Claude Sonnet 4.6 at $3/$15. Claude Haiku 4.5 at $1/$5. Cross-region inference adds approximately 10% to rates. Provisioned Throughput pricing is hourly-billed at approximately $40-$200/hour depending on model and capacity. Batch inference offers a 50% discount on standard on-demand rates. Prompt caching is supported with up to 90% discount on cached input tokens. Bedrock pricing for Claude models matches direct Anthropic API pricing in standard regions, so the choice between Bedrock and direct Anthropic API typically comes down to AWS account integration, compliance requirements, and unified billing rather than price.

Question 8

AWS Bedrock pricing for 2026: full breakdown?

Accepted Answer

AWS Bedrock pricing in April 2026 by model family. Anthropic Claude: Opus 4.6 at $5/$25, Sonnet 4.6 at $3/$15, Haiku 4.5 at $1/$5 per million tokens. Meta Llama: Llama 3.3 70B at $0.72/$0.72, Llama 3.2 90B at $2.00/$2.00. Mistral: Large 2 at $3/$9, Small at $0.20/$0.60. Amazon Nova: Pro at $0.80/$3.20, Lite at $0.06/$0.24, Micro at $0.035/$0.14. Cohere Command R+ at $3/$15. Cross-region inference adds ~10%. Batch inference: 50% discount. Provisioned Throughput: hourly billing for guaranteed capacity. Prompt caching: up to 90% off cached input. Pricing has held steady through 2026 with no major increases.

Question 9

AWS Bedrock vs Anthropic direct API: which is cheaper?

Accepted Answer

AWS Bedrock and direct Anthropic API pricing for Claude models match in standard regions. Cross-region inference on Bedrock adds approximately 10% to rates, which makes Bedrock slightly more expensive for cross-region workloads. The choice typically comes down to non-price factors: AWS account integration and unified billing favor Bedrock, particularly for enterprises already on AWS. Direct Anthropic API integration favors faster access to new model releases (Anthropic releases on direct API first, then Bedrock typically lags by 1-4 weeks). Compliance requirements (HIPAA, FedRAMP, IL-5) often favor Bedrock through existing AWS compliance umbrellas. For pure price optimization at standard regions, the two are equivalent; for non-price factors, the better fit depends on your AWS posture.

Model	Input / 1M tokens	Output / 1M tokens	Context Window
Claude Haiku 4.5	$1.00	$5.00	200K
Claude Sonnet 4.6	$3.00	$15.00	200K
Claude Opus 4.6	$5.00	$25.00	200K
Llama 3.1 8B	$0.30	$0.60	128K
Llama 3.1 70B	$2.65	$3.50	128K
Llama 3.1 405B	$5.32	$16.00	128K
Mistral Large	$4.00	$12.00	128K
Cohere Command R+	$2.50	$10.00	128K
Cohere Embed v3	$0.10	n/a	512
Llama 4 Scout (109B MoE)	$0.27	$0.36	10M
Llama 4 Maverick (402B MoE)	$0.50	$0.77	1M
Mistral Large 2	$2.00	$6.00	128K

AWS Bedrock Pricing 2026: Claude, Llama, and Mistral Through One AWS Endpoint

On-Demand

Batch Inference

Provisioned Throughput

Knowledge Bases / Agents

On-Demand Model Pricing Table

Bedrock vs Direct API Pricing: Are You Actually Saving Money?

Bedrock vs Direct API Access: When Bedrock Wins

Hidden Costs & Gotchas

Which Plan Do You Need?

AWS-native team

Multi-model application

High-volume production workload

Cost-sensitive team

The Bottom Line

Related Resources

Frequently Asked Questions

AWS Bedrock Pricing Update Tracker (2026)