Claude Pricing Guide (April 2026): Every Model, Discount, and Cost Optimization
Anthropic's Claude lineup has grown significantly since the Claude 3 era. The current generation includes Opus 4.6, Sonnet 4.6, and Haiku 4.5, each with different price points and capabilities. What makes Claude pricing tricky is the stacking discounts: batch API, prompt caching, and model routing can reduce your bill by 80-90% compared to naive single-model usage. This guide covers every price point, every discount mechanism, and practical cost math for real production workloads.
Current Claude Model Pricing
All prices are per 1 million tokens. Verified against Anthropic's pricing page, April 2026.
Claude Haiku 4.5
- ✓ 200K context window
- ✓ 64K max output
- ✓ Extended thinking
- ✓ Fastest Claude model
- ✓ Best for classification, extraction, routing
- ✓ Batch: $0.50 / $2.50
Claude Sonnet 4.6
- ✓ 1M context window
- ✓ 64K max output
- ✓ Extended thinking
- ✓ Best quality-to-cost ratio
- ✓ Strong at coding, analysis, writing
- ✓ Batch: $1.50 / $7.50
Claude Opus 4.6
- ✓ 1M context window
- ✓ 128K max output
- ✓ Extended thinking
- ✓ Highest capability
- ✓ Complex reasoning, research tasks
- ✓ Batch: $2.50 / $12.50
The pricing structure follows a consistent pattern: output tokens cost exactly 5x input tokens across all models. This is important for cost estimation. If your application generates long outputs (code, articles, reports), output tokens will dominate your bill. If your application sends long context but generates short responses (classification, extraction, summarization), input tokens are the primary cost driver.
Batch API: 50% Off Everything
The Batch API is the simplest way to cut Claude costs in half. You submit a batch of requests, and Anthropic returns results within 24 hours (typically faster, often within 1-2 hours). Both input and output tokens are billed at 50% of standard rates.
| Model | Standard Input | Batch Input | Standard Output | Batch Output |
|---|---|---|---|---|
| Haiku 4.5 | $1.00 | $0.50 | $5.00 | $2.50 |
| Sonnet 4.6 | $3.00 | $1.50 | $15.00 | $7.50 |
| Opus 4.6 | $5.00 | $2.50 | $25.00 | $12.50 |
The batch API works well for: content generation pipelines, bulk document analysis, data extraction and classification, evaluation runs, and any workload where you do not need real-time responses. It does not work for chatbots, interactive assistants, or anything requiring sub-second latency.
One detail that surprises developers: prompt caching works with the Batch API. The discounts stack. A cached-input batch request on Haiku costs $0.05/1M for cached input tokens, which is 20x cheaper than standard Haiku input pricing.
Prompt Caching: Up to 90% Off Input Tokens
Prompt caching is Claude's most powerful cost-saving feature, and it is underused. The concept: you mark a section of your prompt (system instructions, reference documents, few-shot examples) as cacheable. The first request writes it to a cache. Subsequent requests read from that cache at 90% off the normal input price.
How It Works
You add a cache_control breakpoint to your message content. Everything before that breakpoint is eligible for caching. There are two cache durations:
- 5-minute cache (ephemeral): Cache write costs 1.25x normal input. Cache read costs 0.1x normal input. Best for interactive sessions where the same system prompt is reused across multiple user messages.
- 1-hour cache: Cache write costs 2x normal input. Cache read costs 0.1x normal input. Best for batch processing pipelines where the same context is used across many requests over a longer period.
Caching Cost Math
| Model | Normal Input | Cache Write (5min) | Cache Read | Savings After 1 Read |
|---|---|---|---|---|
| Haiku 4.5 | $1.00 | $1.25 | $0.10 | Yes (breakeven at 1.4 reads) |
| Sonnet 4.6 | $3.00 | $3.75 | $0.30 | Yes (breakeven at 1.3 reads) |
| Opus 4.6 | $5.00 | $6.25 | $0.50 | Yes (breakeven at 1.3 reads) |
The breakeven point is remarkably fast. If you reuse a cached prompt even twice, you save money. For a chatbot with a 2,000-token system prompt handling 100 messages per session, prompt caching turns $0.006 per message (at Sonnet rates) into $0.0006 per message after the first request. Over a million messages per month, that is $5,400 in savings.
Minimum Cache Sizes
The cached content must meet minimum token thresholds: Haiku 4.5 requires 1,024 tokens, Sonnet 4.6 and Opus 4.6 require 2,048 tokens. If your system prompt is shorter than these thresholds, pad it with reference documentation or few-shot examples. The caching savings almost always justify adding more context.
The Optimal Cost Stack
Combine all three optimizations for maximum savings. Route simple tasks to Haiku 4.5 (model routing). Use prompt caching for any repeated context (90% off input). Use the Batch API for non-real-time workloads (50% off everything). A pipeline using all three can run at $0.05/1M cached input tokens on Haiku batch, which is 100x cheaper than standard Opus input pricing.
Extended Thinking Costs
All current Claude models support extended thinking, where the model reasons internally before generating a visible response. This improves output quality on complex tasks but increases costs because thinking tokens are billed as output tokens.
Here is what this means in practice:
- A simple classification task might use 50 thinking tokens and 10 output tokens. The thinking overhead is negligible.
- A complex coding task might use 5,000 thinking tokens and 500 output tokens. You are paying for 5,500 output tokens, 10x what you see in the response.
- A deep research analysis might use 20,000+ thinking tokens. At Opus output rates ($25/1M), 20K thinking tokens cost $0.50 per request.
You can set a max_tokens budget to cap thinking costs. This is important for production applications where unpredictable thinking lengths could spike your bill. A reasonable default: set max_tokens to 3-5x your expected visible output length.
Fast Mode Pricing (Opus 4.6 Only)
Opus 4.6 has a "fast" mode in research preview that provides significantly faster output at a steep premium: 6x standard pricing.
| Mode | Input / 1M | Output / 1M |
|---|---|---|
| Opus 4.6 (standard) | $5.00 | $25.00 |
| Opus 4.6 (fast) | $30.00 | $150.00 |
At $150/1M output tokens, fast mode is the most expensive LLM API available from any major provider. It makes sense for latency-sensitive applications where Opus-level reasoning is required and cost is secondary. For most workloads, Sonnet 4.6 at standard speed provides better value than Opus at fast speed.
Claude vs. Competitors: Price Comparison
How Claude stacks up against the other major LLM providers on price as of April 2026.
| Model | Input / 1M | Output / 1M | Context |
|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $25.00 | 1M |
| GPT-5 | $10.00 | $30.00 | 128K |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 1M |
| GPT-4o | $2.50 | $10.00 | 128K |
| GPT-4.1 | $2.00 | $8.00 | 1M |
| Gemini 2.5 Pro | $1.25 | $10.00 | 1M |
| Claude Haiku 4.5 | $1.00 | $5.00 | 200K |
| GPT-4o mini | $0.15 | $0.60 | 128K |
| Gemini 2.0 Flash | $0.075 | $0.30 | 1M |
Claude Opus 4.6 at $5/$25 is cheaper than GPT-5 at $10/$30 while offering a larger context window (1M vs. 128K). Sonnet 4.6 at $3/$15 is slightly more expensive than GPT-4o at $2.50/$10 on a per-token basis. Where Claude loses on price is the budget tier: Haiku at $1/$5 is significantly more expensive than GPT-4o mini at $0.15/$0.60 or Gemini 2.0 Flash at $0.075/$0.30.
The pricing gap at the budget tier matters for high-volume classification and extraction tasks. If you are processing millions of documents and do not need Claude-quality reasoning, GPT-4o mini or Gemini Flash save 7-13x per token.
Real-World Cost Examples
Concrete cost estimates for common workloads, assuming standard pricing (no batch, no caching) and then with optimizations applied.
| Workload | Model | Standard Cost | With Batch + Cache |
|---|---|---|---|
| Chatbot (10K msgs/day, 500 in + 200 out avg) | Sonnet 4.6 | $240/mo | $48/mo |
| Code review (1K PRs/day, 2K in + 1K out) | Sonnet 4.6 | $630/mo | $126/mo |
| Document classification (100K docs/day, 1K in + 50 out) | Haiku 4.5 | $3,750/mo | $375/mo |
| Research analysis (100 reports/day, 10K in + 5K out) | Opus 4.6 | $525/mo | $105/mo |
The "with optimizations" column assumes batch API (50% off) plus prompt caching on a 2K-token system prompt reused across all requests (which reduces that portion of input to 10% of standard). The actual savings depend on how much of your input is cacheable and whether your workload tolerates batch latency.
Model Selection: When to Use Which
Picking the right model for each task is the highest-impact cost decision. Here is a practical framework.
- Haiku 4.5 ($1/$5): Classification, entity extraction, data formatting, intent routing, simple Q&A, content filtering, and any task where the input is structured and the output is short. If you can write the expected output format in a few sentences, Haiku handles it.
- Sonnet 4.6 ($3/$15): Code generation, document analysis, multi-step reasoning, long-form writing, and general-purpose assistant tasks. This is the model most production applications should default to. It handles 90% of tasks at a fraction of Opus cost.
- Opus 4.6 ($5/$25): Complex research synthesis, novel problem-solving, tasks requiring deep domain reasoning, and situations where getting the right answer on the first try is critical (legal analysis, medical reasoning, financial modeling). Only use Opus when Sonnet demonstrably fails at the task.
A well-designed system routes each request to the cheapest capable model. A support chatbot might use Haiku for FAQ-style questions, Sonnet for troubleshooting conversations, and Opus only for escalated technical analysis. This model routing approach can reduce overall costs by 40-60% compared to running everything through Sonnet.
Related Resources
Frequently Asked Questions
How much does the Claude API cost in 2026?
Haiku 4.5: $1/$5 per 1M input/output tokens. Sonnet 4.6: $3/$15. Opus 4.6: $5/$25. Output tokens always cost 5x input. The Batch API cuts all prices by 50%, and prompt caching reduces input costs by up to 90%.
What is the cheapest way to use Claude?
Combine three things: (1) Route simple tasks to Haiku 4.5 and complex ones to Sonnet 4.6. (2) Enable prompt caching for any repeated context like system prompts, saving 90% on cached input tokens. (3) Use the Batch API for non-real-time workloads, saving 50% on everything. Together, these optimizations can reduce costs by 80-90% compared to naive Opus usage.
How does prompt caching work?
Mark parts of your prompt as cacheable. The first call writes to cache at 1.25x cost (5-minute TTL) or 2x (1-hour TTL). All subsequent calls read from cache at 0.1x cost, a 90% discount. Minimum cache size: 1,024 tokens for Haiku, 2,048 for Sonnet and Opus. The cache pays for itself after 1-2 reads.
Is Claude cheaper than GPT-4o?
Not on raw per-token price. GPT-4o at $2.50/$10 is cheaper than Sonnet 4.6 at $3/$15. GPT-4o mini at $0.15/$0.60 is much cheaper than Haiku at $1/$5. Claude's advantage is output quality on complex tasks, which can mean fewer retries and lower effective cost per useful response.
Does extended thinking increase costs?
Yes. Thinking tokens are billed as output tokens at standard output rates. A complex task generating 5,000 thinking tokens plus 500 response tokens means you pay for 5,500 output tokens. Set max_tokens to cap thinking costs in production.
What are Claude's context window sizes?
Opus 4.6 and Sonnet 4.6 have 1 million token context windows. Haiku 4.5 has 200,000 tokens. Max output: Opus gets 128K tokens, Sonnet and Haiku get 64K each. All models support extended thinking.
How does Claude's Batch API work?
Submit multiple requests as a batch and receive results within 24 hours (usually 1-2 hours). All token prices are 50% off. Prompt caching discounts stack on top. The Batch API supports all current Claude models and is ideal for any workload that does not need real-time responses.