How much does Claude API cost in 2026?

Claude Haiku 4.5 costs $1 per 1M input tokens and $5 per 1M output tokens. Sonnet 4.6 costs $3/$15. Opus 4.6 costs $5/$25. Output tokens always cost 5x input. The Batch API cuts all prices by 50%, and prompt caching reduces input costs by up to 90%.

What is the cheapest Claude model?

Claude Haiku 4.5 at $1/$5 per 1M tokens. With the Batch API, it drops to $0.50/$2.50. With prompt caching on batch, cached input costs as little as $0.05 per 1M.

How does prompt caching work with Claude?

Mark parts of your prompt as cacheable. First call writes at 1.25x cost (5-min TTL) or 2x (1-hour TTL). Subsequent reads cost 0.1x, a 90% discount. Minimum cache: 1,024 tokens for Haiku, 2,048 for Sonnet/Opus.

What is Claude Batch API and how much does it save?

Submit multiple requests as a batch, receive results within 24 hours. All token prices are 50% off. Opus drops from $5/$25 to $2.50/$12.50. Sonnet from $3/$15 to $1.50/$7.50. Haiku from $1/$5 to $0.50/$2.50. Prompt caching discounts stack.

Claude Cost Optimization Guide (April 2026): Batch API, Prompt Caching, and Model Routing

Q: Is Claude cheaper than GPT-4o?

On raw per-token price, GPT-4o ($2.50/$10) is cheaper than Sonnet 4.6 ($3/$15). GPT-4o mini ($0.15/$0.60) is much cheaper than Haiku ($1/$5). Claude often produces better output on first attempt for complex tasks, which can mean fewer retries.

Claude's stacking discounts change the math significantly. Batch API cuts costs 50%. Prompt caching on repeated system prompts saves 90%. Routing between Haiku ($1/1M) and Sonnet ($3/1M) based on task complexity can cut your overall bill by 60-80%. This guide covers every discount mechanism and shows real cost calculations for production workloads. For raw per-model rates, see our Anthropic API Pricing page.

Current Claude Model Pricing

All prices are per 1 million tokens. Verified against Anthropic's pricing page, April 2026.

Claude Haiku 4.5

$1 / $5

per 1M input / output tokens

✓ 200K context window
✓ 64K max output
✓ Extended thinking
✓ Fastest Claude model
✓ Best for classification, extraction, routing
✓ Batch: $0.50 / $2.50

Claude Sonnet 4.6

$3 / $15

per 1M input / output tokens

✓ 1M context window
✓ 64K max output
✓ Extended thinking
✓ Best quality-to-cost ratio
✓ Strong at coding, analysis, writing
✓ Batch: $1.50 / $7.50

Claude Opus 4.6

$5 / $25

per 1M input / output tokens

✓ 1M context window
✓ 128K max output
✓ Extended thinking
✓ Highest capability
✓ Complex reasoning, research tasks
✓ Batch: $2.50 / $12.50

The pricing structure follows a consistent pattern: output tokens cost exactly 5x input tokens across all models. This is important for cost estimation. If your application generates long outputs (code, articles, reports), output tokens will dominate your bill. If your application sends long context but generates short responses (classification, extraction, summarization), input tokens are the primary cost driver.

Batch API: 50% Off Everything

The Batch API is the simplest way to cut Claude costs in half. You submit a batch of requests, and Anthropic returns results within 24 hours (typically faster, often within 1-2 hours). Both input and output tokens are billed at 50% of standard rates.

Model	Standard Input	Batch Input	Standard Output	Batch Output
Haiku 4.5	$1.00	$0.50	$5.00	$2.50
Sonnet 4.6	$3.00	$1.50	$15.00	$7.50
Opus 4.6	$5.00	$2.50	$25.00	$12.50

The batch API works well for: content generation pipelines, bulk document analysis, data extraction and classification, evaluation runs, and any workload where you do not need real-time responses. It does not work for chatbots, interactive assistants, or anything requiring sub-second latency.

One detail that surprises developers: prompt caching works with the Batch API. The discounts stack. A cached-input batch request on Haiku costs $0.05/1M for cached input tokens, which is 20x cheaper than standard Haiku input pricing.

Prompt Caching: Up to 90% Off Input Tokens

Prompt caching is Claude's most powerful cost-saving feature, and it is underused. The concept: you mark a section of your prompt (system instructions, reference documents, few-shot examples) as cacheable. The first request writes it to a cache. Subsequent requests read from that cache at 90% off the normal input price.

How It Works

You add a cache_control breakpoint to your message content. Everything before that breakpoint is eligible for caching. There are two cache durations:

5-minute cache (ephemeral): Cache write costs 1.25x normal input. Cache read costs 0.1x normal input. Best for interactive sessions where the same system prompt is reused across multiple user messages.
1-hour cache: Cache write costs 2x normal input. Cache read costs 0.1x normal input. Best for batch processing pipelines where the same context is used across many requests over a longer period.

Caching Cost Math

Model	Normal Input	Cache Write (5min)	Cache Read	Savings After 1 Read
Haiku 4.5	$1.00	$1.25	$0.10	Yes (breakeven at 1.4 reads)
Sonnet 4.6	$3.00	$3.75	$0.30	Yes (breakeven at 1.3 reads)
Opus 4.6	$5.00	$6.25	$0.50	Yes (breakeven at 1.3 reads)

The breakeven point is remarkably fast. If you reuse a cached prompt even twice, you save money. For a chatbot with a 2,000-token system prompt handling 100 messages per session, prompt caching turns $0.006 per message (at Sonnet rates) into $0.0006 per message after the first request. Over a million messages per month, that is $5,400 in savings.

Minimum Cache Sizes

The cached content must meet minimum token thresholds: Haiku 4.5 requires 1,024 tokens, Sonnet 4.6 and Opus 4.6 require 2,048 tokens. If your system prompt is shorter than these thresholds, pad it with reference documentation or few-shot examples. The caching savings almost always justify adding more context.

The Optimal Cost Stack

Combine all three optimizations for maximum savings. Route simple tasks to Haiku 4.5 (model routing). Use prompt caching for any repeated context (90% off input). Use the Batch API for non-real-time workloads (50% off everything). A pipeline using all three can run at $0.05/1M cached input tokens on Haiku batch, which is 100x cheaper than standard Opus input pricing.

Extended Thinking Costs

All current Claude models support extended thinking, where the model reasons internally before generating a visible response. This improves output quality on complex tasks but increases costs because thinking tokens are billed as output tokens.

Here is what this means in practice:

A simple classification task might use 50 thinking tokens and 10 output tokens. The thinking overhead is negligible.
A complex coding task might use 5,000 thinking tokens and 500 output tokens. You are paying for 5,500 output tokens, 10x what you see in the response.
A deep research analysis might use 20,000+ thinking tokens. At Opus output rates ($25/1M), 20K thinking tokens cost $0.50 per request.

You can set a max_tokens budget to cap thinking costs. This is important for production applications where unpredictable thinking lengths could spike your bill. A reasonable default: set max_tokens to 3-5x your expected visible output length.

Fast Mode Pricing (Opus 4.6 Only)

Opus 4.6 has a "fast" mode in research preview that provides significantly faster output at a steep premium: 6x standard pricing.

Mode	Input / 1M	Output / 1M
Opus 4.6 (standard)	$5.00	$25.00
Opus 4.6 (fast)	$30.00	$150.00

At $150/1M output tokens, fast mode is the most expensive LLM API available from any major provider. It makes sense for latency-sensitive applications where Opus-level reasoning is required and cost is secondary. For most workloads, Sonnet 4.6 at standard speed provides better value than Opus at fast speed.

Claude vs. Competitors: Price Comparison

How Claude stacks up against the other major LLM providers on price as of April 2026.

Model	Input / 1M	Output / 1M	Context
Claude Opus 4.6	$5.00	$25.00	1M
GPT-5	$10.00	$30.00	128K
Claude Sonnet 4.6	$3.00	$15.00	1M
GPT-4o	$2.50	$10.00	128K
GPT-4.1	$2.00	$8.00	1M
Gemini 2.5 Pro	$1.25	$10.00	1M
Claude Haiku 4.5	$1.00	$5.00	200K
GPT-4o mini	$0.15	$0.60	128K
Gemini 2.0 Flash	$0.075	$0.30	1M

Claude Opus 4.6 at $5/$25 is cheaper than GPT-5 at $10/$30 while offering a larger context window (1M vs. 128K). Sonnet 4.6 at $3/$15 is slightly more expensive than GPT-4o at $2.50/$10 on a per-token basis. Where Claude loses on price is the budget tier: Haiku at $1/$5 is significantly more expensive than GPT-4o mini at $0.15/$0.60 or Gemini 2.0 Flash at $0.075/$0.30.

The pricing gap at the budget tier matters for high-volume classification and extraction tasks. If you are processing millions of documents and do not need Claude-quality reasoning, GPT-4o mini or Gemini Flash save 7-13x per token.

Real-World Cost Examples

Concrete cost estimates for common workloads, assuming standard pricing (no batch, no caching) and then with optimizations applied.

Workload	Model	Standard Cost	With Batch + Cache
Chatbot (10K msgs/day, 500 in + 200 out avg)	Sonnet 4.6	$240/mo	$48/mo
Code review (1K PRs/day, 2K in + 1K out)	Sonnet 4.6	$630/mo	$126/mo
Document classification (100K docs/day, 1K in + 50 out)	Haiku 4.5	$3,750/mo	$375/mo
Research analysis (100 reports/day, 10K in + 5K out)	Opus 4.6	$525/mo	$105/mo

The "with optimizations" column assumes batch API (50% off) plus prompt caching on a 2K-token system prompt reused across all requests (which reduces that portion of input to 10% of standard). The actual savings depend on how much of your input is cacheable and whether your workload tolerates batch latency.

Model Selection: When to Use Which

Picking the right model for each task is the highest-impact cost decision. Here is a practical framework.

Haiku 4.5 ($1/$5): Classification, entity extraction, data formatting, intent routing, simple Q&A, content filtering, and any task where the input is structured and the output is short. If you can write the expected output format in a few sentences, Haiku handles it.
Sonnet 4.6 ($3/$15): Code generation, document analysis, multi-step reasoning, long-form writing, and general-purpose assistant tasks. This is the model most production applications should default to. It handles 90% of tasks at a fraction of Opus cost.
Opus 4.6 ($5/$25): Complex research synthesis, novel problem-solving, tasks requiring deep domain reasoning, and situations where getting the right answer on the first try is critical (legal analysis, medical reasoning, financial modeling). Only use Opus when Sonnet demonstrably fails at the task.

A well-designed system routes each request to the cheapest capable model. A support chatbot might use Haiku for FAQ-style questions, Sonnet for troubleshooting conversations, and Opus only for escalated technical analysis. This model routing approach can reduce overall costs by 40-60% compared to running everything through Sonnet.

Related Resources

Frequently Asked Questions

How much does the Claude API cost in 2026?

Haiku 4.5: $1/$5 per 1M input/output tokens. Sonnet 4.6: $3/$15. Opus 4.6: $5/$25. Output tokens always cost 5x input. The Batch API cuts all prices by 50%, and prompt caching reduces input costs by up to 90%.

What is the cheapest way to use Claude?

Combine three things: (1) Route simple tasks to Haiku 4.5 and complex ones to Sonnet 4.6. (2) Enable prompt caching for any repeated context like system prompts, saving 90% on cached input tokens. (3) Use the Batch API for non-real-time workloads, saving 50% on everything. Together, these optimizations can reduce costs by 80-90% compared to naive Opus usage.

How does prompt caching work?

Mark parts of your prompt as cacheable. The first call writes to cache at 1.25x cost (5-minute TTL) or 2x (1-hour TTL). All subsequent calls read from cache at 0.1x cost, a 90% discount. Minimum cache size: 1,024 tokens for Haiku, 2,048 for Sonnet and Opus. The cache pays for itself after 1-2 reads.

Is Claude cheaper than GPT-4o?

Not on raw per-token price. GPT-4o at $2.50/$10 is cheaper than Sonnet 4.6 at $3/$15. GPT-4o mini at $0.15/$0.60 is much cheaper than Haiku at $1/$5. Claude's advantage is output quality on complex tasks, which can mean fewer retries and lower effective cost per useful response.

Does extended thinking increase costs?

Yes. Thinking tokens are billed as output tokens at standard output rates. A complex task generating 5,000 thinking tokens plus 500 response tokens means you pay for 5,500 output tokens. Set max_tokens to cap thinking costs in production.

What are Claude's context window sizes?

Opus 4.6 and Sonnet 4.6 have 1 million token context windows. Haiku 4.5 has 200,000 tokens. Max output: Opus gets 128K tokens, Sonnet and Haiku get 64K each. All models support extended thinking.

How does Claude's Batch API work?

Submit multiple requests as a batch and receive results within 24 hours (usually 1-2 hours). All token prices are 50% off. Prompt caching discounts stack on top. The Batch API supports all current Claude models and is ideal for any workload that does not need real-time responses.

Claude Cost Optimization Guide (April 2026): Batch API, Prompt Caching, and Model Routing

Current Claude Model Pricing

Claude Haiku 4.5

Claude Sonnet 4.6

Claude Opus 4.6

Batch API: 50% Off Everything

Prompt Caching: Up to 90% Off Input Tokens

How It Works

Caching Cost Math

Minimum Cache Sizes

The Optimal Cost Stack

Extended Thinking Costs

Fast Mode Pricing (Opus 4.6 Only)

Claude vs. Competitors: Price Comparison

Real-World Cost Examples

Model Selection: When to Use Which

Related Resources

Frequently Asked Questions

How much does the Claude API cost in 2026?

What is the cheapest way to use Claude?

How does prompt caching work?

Is Claude cheaper than GPT-4o?

Does extended thinking increase costs?

What are Claude's context window sizes?

How does Claude's Batch API work?

AI API pricing changes weekly. We track all of it.