AWS Bedrock Pricing: Every Model's Real Cost (April 2026)

AWS Bedrock gives you API access to models from Anthropic, Meta, Mistral, Cohere, and others through a single AWS endpoint. The model lineup expanded significantly in early 2026 with Claude Opus 4.6, Llama 4 Scout and Maverick, and Mistral Large 2. Bedrock's pricing matches the providers' direct API rates in most cases. The three billing modes remain: on-demand (pay per token), batch (50% off), and provisioned throughput (reserved capacity). This page covers every major model's current pricing on Bedrock, the new 2026 additions, and when each billing option makes sense.

Most Popular

On-Demand

Per-token Pay per use, no commitment
  • Claude Sonnet 4.6: $3/$15 per 1M tokens
  • Llama 3.1 70B: $2.65/$3.50 per 1M tokens
  • Mistral Large: $4/$12 per 1M tokens
  • No minimum spend or commitment
  • Best for variable or unpredictable workloads

Batch Inference

50% off on-demand rates
  • Half the cost of on-demand
  • Results within 24 hours
  • Same models and quality
  • Best for offline processing
  • Submit via S3 bucket

Provisioned Throughput

Hourly rate reserved capacity
  • Guaranteed throughput for production
  • 1-month or 6-month commitments
  • No per-token charges
  • Best for steady high-volume workloads
  • Custom model hosting available

Knowledge Bases / Agents

Usage-based per query + storage
  • Managed RAG pipeline
  • Vector storage included
  • Automatic document chunking
  • Agent orchestration built in
  • Additional charges on top of model costs

On-Demand Model Pricing Table

Here's what every major model costs on Bedrock's on-demand tier. These prices match (or are very close to) the providers' direct API pricing.

ModelInput / 1M tokensOutput / 1M tokensContext Window
Claude Haiku 4.5$1.00$5.00200K
Claude Sonnet 4.6$3.00$15.00200K
Claude Opus 4.6$5.00$25.00200K
Llama 3.1 8B$0.30$0.60128K
Llama 3.1 70B$2.65$3.50128K
Llama 3.1 405B$5.32$16.00128K
Mistral Large$4.00$12.00128K
Cohere Command R+$2.50$10.00128K
Cohere Embed v3$0.10512
Llama 4 Scout (109B MoE)$0.27$0.3610M
Llama 4 Maverick (402B MoE)$0.50$0.771M
Mistral Large 2$2.00$6.00128K

Bedrock vs Direct API Pricing: Are You Actually Saving Money?

A common question in 2026 is whether Bedrock adds a markup over calling model providers directly. The short answer: for on-demand pricing, Bedrock's per-token rates match the providers' published prices almost exactly. Claude Sonnet 4.6 costs $3/$15 per million tokens on both Bedrock and the Anthropic API. Llama models are priced at Meta's published inference rates.

Where the math gets interesting is prompt caching and batch discounts. Anthropic's direct API offers prompt caching that reduces costs by 90% on repeated prefixes. Bedrock supports prompt caching for Claude models, but the feature availability can lag behind Anthropic's direct API by weeks. If prompt caching is critical to your cost structure, verify that the specific caching feature you need is live on Bedrock before committing.

Batch inference is 50% off on Bedrock, matching the discount available on direct APIs. The operational advantage of Bedrock batch is S3 integration: you drop your input files in an S3 bucket, Bedrock processes them, and results appear in another bucket. No webhook management, no polling. For teams already running data pipelines on AWS, this eliminates real integration work.

The hidden cost that catches teams is Knowledge Bases. The managed RAG pipeline uses OpenSearch Serverless under the hood, which has a minimum cost of roughly $700/month (4 OCUs). For a simple chatbot that only needs vector search over a few thousand documents, this floor cost makes Bedrock's managed RAG dramatically more expensive than running Pinecone ($50/month) or pgvector (free) alongside direct API calls.

Bedrock vs Direct API Access: When Bedrock Wins

Bedrock's pricing matches direct API pricing, so the decision comes down to operational benefits, not cost savings.

Bedrock wins when your team is already AWS-native. IAM authentication means no API key management. CloudWatch gives you usage metrics alongside your other AWS monitoring. VPC endpoints keep traffic off the public internet. These are significant operational advantages for enterprise teams.

Bedrock also wins for multi-model architectures. Instead of managing API keys, billing, and SDKs for Anthropic, Meta, Mistral, and Cohere separately, Bedrock gives you one endpoint. Model switching is a config change, not an integration project.

Direct APIs win when you need the providers' latest features fastest. New models and capabilities (like Anthropic's prompt caching or OpenAI's batch API improvements) often hit the direct API before they're available on Bedrock. If being on the latest version matters, direct access has less lag.

Hidden Costs & Gotchas

  • Bedrock's on-demand pricing for Claude matches Anthropic's direct API pricing. You're not paying a premium for the AWS wrapper, but you're also not getting a discount.
  • Legacy Claude models (3.5 Sonnet in Public Extended Access) now cost $6/$30 per 1M tokens, double the current Sonnet 4.6 price. If your code references old model IDs, you're overpaying.
  • Knowledge Bases adds charges for vector storage, document processing, and retrieval on top of the model inference cost. A simple RAG setup can cost $50-200/month in Bedrock-specific charges.
  • Provisioned throughput requires 1-month minimum commitments. If your traffic drops, you still pay for reserved capacity. Only commit after you have stable baseline traffic data.
  • Data transfer costs apply when moving data between AWS regions or out of AWS. These are standard AWS charges but easy to overlook when budgeting for AI.
  • Bedrock Agents adds orchestration charges per step. A multi-step agent workflow costs more than a single model invocation for the same output.
  • Model availability varies by AWS region. Not all models are available in all regions. Check your region before building a pipeline.

Which Plan Do You Need?

AWS-native team

On-demand Bedrock. If your infrastructure is already on AWS, Bedrock keeps everything in one ecosystem. IAM auth, CloudWatch metrics, and VPC endpoints work out of the box.

Multi-model application

On-demand Bedrock gives you Claude, Llama, Mistral, and Cohere through a single API endpoint. No need for separate API keys and billing from each provider.

High-volume production workload

Provisioned throughput. If you need guaranteed latency and throughput for a steady workload, provisioned capacity eliminates throttling risk at a predictable cost.

Cost-sensitive team

Consider the providers' direct APIs. Anthropic and OpenAI offer the same models at the same price, and some offer additional discounts (prompt caching, batch API) that may not be available on Bedrock.

The Bottom Line

Bedrock makes sense for teams already on AWS who want one API endpoint for multiple model providers. The pricing matches direct API costs for on-demand usage, and batch inference at 50% off is the same deal you'd get from the providers directly. The value-add is operational: IAM auth, CloudWatch, VPC endpoints, and managed RAG via Knowledge Bases. If you're not on AWS, there's no pricing reason to choose Bedrock over direct API access.

Disclosure: Pricing information is sourced from official websites and may change. We update this page regularly but always verify current pricing on the vendor's site before purchasing.

Related Resources

Anthropic API Pricing → OpenAI API Pricing → Cohere Pricing → OpenAI vs Anthropic API →

Frequently Asked Questions

Is AWS Bedrock cheaper than using APIs directly?

No. On-demand pricing on Bedrock matches the model providers' direct pricing (e.g., Claude on Bedrock costs the same as Claude on Anthropic's API). Batch inference is 50% cheaper, which is the same discount Anthropic offers directly. The value of Bedrock is AWS integration, not lower prices.

What models are available on AWS Bedrock?

Bedrock offers Claude (Anthropic), Llama (Meta), Mistral, Amazon Titan, Cohere Command, and Stability AI models. The selection is broad but model versions may lag behind the providers' direct APIs by a few weeks.

What is Provisioned Throughput?

Provisioned Throughput reserves dedicated model capacity for your workload. You pay an hourly rate for guaranteed throughput and latency. It requires a 1-month or 6-month commitment. Worth it for high-volume production workloads where consistent performance matters.

How much do Bedrock Knowledge Bases cost?

Knowledge Base queries cost $0.01 per retrieval query plus the LLM costs for generating answers. The backend uses OpenSearch Serverless, which has a minimum of roughly $700/month (4 OCUs). For small projects, this floor cost makes managed RAG expensive compared to alternatives.

Should I use Bedrock or call model APIs directly?

Use Bedrock if you're already on AWS and want to keep data within your VPC, use IAM for access control, and avoid managing separate API keys. Call APIs directly if you want the latest model versions immediately, lower overhead for small projects, or you're not locked into AWS.

Does Bedrock support fine-tuning?

Yes, Bedrock supports fine-tuning for select models including Amazon Titan and some Llama variants. Fine-tuned models require Provisioned Throughput to serve, which adds to the cost. The fine-tuning job itself is billed separately based on the number of tokens processed during training.

See what AI skills pay in your role

Weekly data from 22,000+ job postings. Free.