AWS Bedrock Pricing: What Each Model and Mode Costs
AWS Bedrock gives you access to Claude, Llama, Mistral, Titan, and other models through a single API. The appeal is keeping everything in your AWS ecosystem. But the pricing has three modes (on-demand, provisioned, batch), and it's easy to pick the wrong one. Here's how the costs break down.
On-Demand
- ✓ Claude 3.5 Sonnet: $3/$15 per 1M tokens
- ✓ Claude 3.5 Haiku: $0.25/$1.25 per 1M tokens
- ✓ Llama 3.1 70B: $2.65/$3.50 per 1M tokens
- ✓ Mistral Large: $4/$12 per 1M tokens
- ✓ Amazon Titan Text: $0.50/$1.50 per 1M tokens
- ✓ No commitment, pay only for what you use
Batch Inference
- ✓ Half the cost of on-demand pricing
- ✓ Results returned within 24 hours
- ✓ No real-time latency requirements
- ✓ Good for bulk processing and analysis
- ✓ Available for most models
Provisioned Throughput
- ✓ Guaranteed throughput and latency
- ✓ Custom model deployments (fine-tuned)
- ✓ 1-month commitment: higher hourly rate
- ✓ 6-month commitment: approximately 40% savings
- ✓ Predictable performance for production
Knowledge Bases / Agents
- ✓ Knowledge Bases: $0.01 per query (retrieval)
- ✓ Agent invocations billed per LLM call
- ✓ Vector storage in OpenSearch Serverless
- ✓ S3 storage for source documents
- ✓ Managed RAG without custom infrastructure
Hidden Costs & Gotchas
- ⚠ Bedrock's on-demand model prices are the same as going directly to Anthropic or others. You're not getting a discount for using AWS as the middleman. The value is integration, not savings.
- ⚠ OpenSearch Serverless for Knowledge Bases has a minimum cost of roughly $700/month (4 OCUs). That's a steep floor for a small RAG project.
- ⚠ Provisioned Throughput has a 1-month minimum commitment. If you overprovision, you're paying for capacity you don't use. Get your traffic patterns nailed down before committing.
- ⚠ Data transfer charges apply when calling Bedrock from outside the same AWS region. Cross-region calls add $0.02-0.09/GB on top of token costs.
- ⚠ CloudWatch logging for Bedrock (which you'll want for debugging) adds another $0.50/GB of log data ingested.
Which Plan Do You Need?
AWS-native team using Claude or Llama
On-demand is the starting point. Prices match the model providers directly. The value is staying within your AWS VPC, using IAM for access control, and avoiding separate API key management.
Batch processing or analytics workload
Batch inference at 50% off is hard to beat if you don't need real-time responses. Ideal for processing large document sets, classification jobs, or nightly analysis runs.
High-throughput production application
Provisioned Throughput guarantees latency and availability. The 6-month commitment saves about 40%. Do the math: if you're spending $5K+/month on on-demand, provisioned is likely cheaper.
Team building a RAG application
Knowledge Bases is the easiest path to managed RAG, but the OpenSearch Serverless floor ($700/month) is steep for small projects. Consider self-managed retrieval with Pinecone or Weaviate if you need lower costs.
The Bottom Line
Bedrock's main value isn't cheaper models. It's keeping everything in AWS. If your team already runs on AWS, Bedrock simplifies security, compliance, and infrastructure management. The pricing matches direct API pricing for most models, so you're paying for convenience and integration. Batch inference at 50% off is the best deal. Avoid Knowledge Bases unless you're ready for the $700/month OpenSearch minimum.
Related Resources
Frequently Asked Questions
Is AWS Bedrock cheaper than using APIs directly?
No. On-demand pricing on Bedrock matches the model providers' direct pricing (e.g., Claude on Bedrock costs the same as Claude on Anthropic's API). Batch inference is 50% cheaper, which is the same discount Anthropic offers directly. The value of Bedrock is AWS integration, not lower prices.
What models are available on AWS Bedrock?
Bedrock offers Claude (Anthropic), Llama (Meta), Mistral, Amazon Titan, Cohere Command, and Stability AI models. The selection is broad but model versions may lag behind the providers' direct APIs by a few weeks.
What is Provisioned Throughput?
Provisioned Throughput reserves dedicated model capacity for your workload. You pay an hourly rate for guaranteed throughput and latency. It requires a 1-month or 6-month commitment. Worth it for high-volume production workloads where consistent performance matters.
How much do Bedrock Knowledge Bases cost?
Knowledge Base queries cost $0.01 per retrieval query plus the LLM costs for generating answers. The backend uses OpenSearch Serverless, which has a minimum of roughly $700/month (4 OCUs). For small projects, this floor cost makes managed RAG expensive compared to alternatives.
Should I use Bedrock or call model APIs directly?
Use Bedrock if you're already on AWS and want to keep data within your VPC, use IAM for access control, and avoid managing separate API keys. Call APIs directly if you want the latest model versions immediately, lower overhead for small projects, or you're not locked into AWS.
Does Bedrock support fine-tuning?
Yes, Bedrock supports fine-tuning for select models including Amazon Titan and some Llama variants. Fine-tuned models require Provisioned Throughput to serve, which adds to the cost. The fine-tuning job itself is billed separately based on the number of tokens processed during training.