Anthropic API Pricing: What Every Claude Model Costs in 2026

Anthropic has shipped a lot of models since Claude 3. The naming is confusing, the pricing tiers stack in non-obvious ways, and the gap between the cheapest and most expensive model is now 25x. This page breaks down every current Claude model's pricing, the batch and caching discounts that can cut your bill in half, and real cost math for production apps. All prices verified against Anthropic's official pricing page, April 2026.

Anthropic API (Claude) pricing comparison chart showing plan tiers and costs — Anthropic API (Claude) pricing tiers as of April 2026. Data verified by PE Collective.

Claude Haiku 4.5

$1 / $5 per 1M input / output tokens

✓ Fastest current Claude model
✓ 200K context window, 64K max output
✓ Extended thinking support
✓ Best for classification, extraction, routing
✓ Batch API: $0.50/$2.50 per 1M tokens (50% off)

Claude Sonnet 4.6

$3 / $15 per 1M input / output tokens

✓ Best speed-to-intelligence ratio
✓ 1M context window, 64K max output
✓ Extended thinking + adaptive thinking
✓ The default model for most production apps
✓ Batch API: $1.50/$7.50 per 1M tokens (50% off)

Claude Opus 4.6

$5 / $25 per 1M input / output tokens

✓ Most intelligent Claude model
✓ 1M context window, 128K max output
✓ Extended thinking + adaptive thinking
✓ Best for agents, complex coding, research
✓ Batch API: $2.50/$12.50 per 1M tokens (50% off)

Complete Model Pricing Table (April 2026)

Anthropic offers 12 models across four generations. The table below shows every model with base input pricing, cache pricing, and output pricing. The current flagship models (Opus 4.6, Sonnet 4.6, Haiku 4.5) are at the top. Legacy models are still available via the API but Anthropic recommends migrating.

A few things stand out. First, Opus 4.6 at $5/$25 is dramatically cheaper than the older Opus 4.1 and Opus 4 at $15/$75. Anthropic cut the flagship price by 3x while improving capability. Second, the Sonnet family has held steady at $3/$15 across four generations (Sonnet 3.7 through Sonnet 4.6). Third, Haiku 4.5 at $1/$5 is significantly more expensive than the legacy Haiku 3 at $0.25/$1.25, but the capability jump is massive.

Model	Input / 1M tokens	Output / 1M tokens	Context Window	Max Output	Status
Opus 4.6	$5.00	$25.00	1M tokens	128K tokens	Current
Sonnet 4.6	$3.00	$15.00	1M tokens	64K tokens	Current
Haiku 4.5	$1.00	$5.00	200K tokens	64K tokens	Current
Sonnet 4.5	$3.00	$15.00	200K (1M beta)	64K tokens	Legacy
Opus 4.5	$5.00	$25.00	200K tokens	64K tokens	Legacy
Opus 4.1	$15.00	$75.00	200K tokens	32K tokens	Legacy
Sonnet 4	$3.00	$15.00	200K (1M beta)	64K tokens	Legacy
Opus 4	$15.00	$75.00	200K tokens	32K tokens	Legacy
Haiku 3.5	$0.80	$4.00	200K tokens	64K tokens	Legacy
Haiku 3	$0.25	$1.25	200K tokens	4K tokens	Deprecated (Apr 2026)

Batch API Pricing: 50% Off Everything

The Batch API processes requests asynchronously with results returned within 24 hours. The tradeoff for giving up real-time responses is a flat 50% discount on both input and output tokens. For workloads like data processing, model evaluations, content generation, or any pipeline that doesn't need instant results, this is the single biggest cost lever available.

At batch pricing, Opus 4.6 drops from $5/$25 to $2.50/$12.50 per million tokens. Sonnet 4.6 drops from $3/$15 to $1.50/$7.50. Haiku 4.5 drops from $1/$5 to $0.50/$2.50. The batch discount stacks with prompt caching, meaning a cached batch request can cost as little as 5% of a standard non-cached request.

Here's a concrete example. Say you're running a content moderation pipeline processing 1 million user posts per day. Each post averages 200 input tokens and 50 output tokens. At standard Haiku 4.5 rates, that's $200/day input + $250/day output = $450/day. With batch pricing: $100 + $125 = $225/day. Add prompt caching for a shared system prompt across all requests and you could push that under $150/day.

Model	Standard Input	Batch Input	Standard Output	Batch Output
Opus 4.6	$5.00/MTok	$2.50/MTok	$25.00/MTok	$12.50/MTok
Sonnet 4.6	$3.00/MTok	$1.50/MTok	$15.00/MTok	$7.50/MTok
Haiku 4.5	$1.00/MTok	$0.50/MTok	$5.00/MTok	$2.50/MTok
Haiku 3.5	$0.80/MTok	$0.40/MTok	$4.00/MTok	$2.00/MTok
Haiku 3	$0.25/MTok	$0.125/MTok	$1.25/MTok	$0.625/MTok

Prompt Caching: How the 90% Discount Works

Prompt caching is Anthropic's most powerful cost optimization feature. It stores previously processed portions of your prompt so subsequent requests can read from cache instead of reprocessing. The mechanics are straightforward: the first time you cache content, you pay a write premium. Every subsequent read costs just 10% of the base input price.

There are two cache durations. A 5-minute cache costs 1.25x the base input price to write and pays for itself after just one cache hit. A 1-hour cache costs 2x base input to write and pays for itself after two cache hits. Cache hits cost 0.1x base input regardless of duration.

For Sonnet 4.6, that means a 5-minute cache write costs $3.75/MTok, a 1-hour cache write costs $6.00/MTok, and any cache hit costs just $0.30/MTok. Compare that to the standard $3.00/MTok input price. If you're sending a 5,000-token system prompt on every request and you make 100 requests per hour, you pay for the system prompt once at 1.25x and then 99 times at 0.1x. The savings are enormous.

The best use cases for caching: system prompts that stay constant across requests, RAG contexts where the same documents get referenced repeatedly, conversation histories where earlier turns don't change, and few-shot examples that you include in every call.

One thing to watch: cache multipliers stack with other pricing modifiers. Batch API discount applies first, then caching multipliers apply on top. Long context pricing for Sonnet 4.5 and Sonnet 4 also stacks. The math can get complicated, but the general direction is always cheaper.

Model	Base Input	5-min Cache Write	1-hr Cache Write	Cache Hit
Opus 4.6	$5.00/MTok	$6.25/MTok	$10.00/MTok	$0.50/MTok
Sonnet 4.6	$3.00/MTok	$3.75/MTok	$6.00/MTok	$0.30/MTok
Haiku 4.5	$1.00/MTok	$1.25/MTok	$2.00/MTok	$0.10/MTok

What This Actually Costs Per Month (Real Numbers)

Abstract per-token pricing is hard to reason about. Here are concrete cost breakdowns for common production use cases, using current April 2026 pricing.

Customer support chatbot (Sonnet 4.6)
Processing 10,000 support tickets per day. Average conversation: 2,000 input tokens (system prompt + user message + context), 500 output tokens. With a cached system prompt (1,500 tokens cached, 500 fresh per request): Input cost = 500 fresh tokens x 10K = 5M tokens at $3/MTok ($15) + 15M cached tokens at $0.30/MTok ($4.50). Output = 5M tokens at $15/MTok ($75). Daily total: ~$95. Monthly: ~$2,850.

Code review agent (Opus 4.6)
Reviewing 200 pull requests per day. Average: 8,000 input tokens (code + instructions), 2,000 output tokens (review comments). No caching (each PR is different). Daily: 1.6M input at $5/MTok ($8) + 400K output at $25/MTok ($10). Daily total: ~$18. Monthly: ~$540.

Document processing pipeline (Haiku 4.5 batch)
Classifying and extracting data from 50,000 documents per day. Average: 1,000 input tokens, 200 output tokens. Using Batch API. Daily: 50M input at $0.50/MTok ($25) + 10M output at $2.50/MTok ($25). Daily total: ~$50. Monthly: ~$1,500.

RAG-powered search (Sonnet 4.6 with caching)
1,000 queries per day. Each query: 500-token question + 3,000-token cached context + 1,000-token response. Cached input: 3M tokens at $0.30/MTok ($0.90). Fresh input: 500K tokens at $3/MTok ($1.50). Output: 1M tokens at $15/MTok ($15). Daily total: ~$17.40. Monthly: ~$522.

Anthropic vs OpenAI: How the Pricing Compares

The most common comparison. OpenAI's GPT-4o costs $2.50/$10 per million tokens versus Sonnet 4.6's $3/$15. On raw price, OpenAI wins by about 17% on input and 33% on output. But pricing alone doesn't tell the full story.

Claude's prompt caching is more aggressive than OpenAI's. A cached Sonnet 4.6 request costs $0.30/MTok for input versus GPT-4o's cached rate. The discount structures differ, and depending on your caching hit rate, Claude can be cheaper on effective cost per request.

For the flagship tier, Opus 4.6 at $5/$25 competes with GPT-4o at $2.50/$10 and o1 at $15/$60. Opus is more expensive than GPT-4o but significantly cheaper than o1, while offering comparable reasoning performance. The right comparison depends on which OpenAI model your workload actually needs.

At the budget end, Haiku 4.5 at $1/$5 is substantially more expensive than GPT-4o-mini at $0.15/$0.60. If you have price-sensitive high-volume tasks where quality doesn't matter as much, OpenAI has the edge. But Haiku 4.5 with extended thinking can handle tasks that GPT-4o-mini can't touch.

The batch API discount is identical: both providers offer 50% off for async processing. Anthropic has the edge on caching flexibility (two cache durations vs one). OpenAI has the edge on raw price for their mini model tier.

Model Tier	Anthropic	Anthropic Price	OpenAI	OpenAI Price	Winner
Budget	Haiku 4.5	$1/$5	GPT-4o-mini	$0.15/$0.60	OpenAI (6-8x cheaper)
Workhorse	Sonnet 4.6	$3/$15	GPT-4o	$2.50/$10	OpenAI (~20-30% cheaper)
Flagship	Opus 4.6	$5/$25	o1	$15/$60	Anthropic (3x cheaper)
Batch (workhorse)	Sonnet 4.6 batch	$1.50/$7.50	GPT-4o batch	$1.25/$5	OpenAI (~20% cheaper)
Cheapest ever	Haiku 3 (deprecated)	$0.25/$1.25	GPT-4o-mini	$0.15/$0.60	OpenAI

Extended Thinking: What It Costs and When to Use It

Extended thinking lets Claude reason step-by-step internally before producing a response. The thinking tokens are billed as output tokens at output rates. All three current models support it (Opus 4.6, Sonnet 4.6, Haiku 4.5), plus Opus 4.6 and Sonnet 4.6 support adaptive thinking which lets the model decide how much thinking a task needs.

The cost impact depends on your task. Simple classification might use zero thinking tokens. A complex math problem or multi-file code review might generate 3-5x more thinking tokens than visible output tokens. At Opus 4.6's $25/MTok output rate, a response with 500 visible output tokens and 2,000 thinking tokens costs $0.0625 versus $0.0125 without thinking. That's a 5x multiplier on output cost.

When is it worth it? Extended thinking makes a material difference on tasks requiring multi-step reasoning: code generation with complex logic, mathematical proofs, legal document analysis, and research synthesis. For straightforward tasks like text classification, summarization, or simple Q&A, you're paying for thinking tokens that don't improve the output.

One practical tip: use adaptive thinking when available (Opus 4.6 and Sonnet 4.6). It lets the model skip expensive reasoning for simple requests and engage deep thinking only when needed. This naturally optimizes your thinking token spend without manual intervention.

Still on Claude 3.5? What Upgrading Saves You

Anthropic has deprecated several older models. Claude Haiku 3 (the cheapest model ever at $0.25/$1.25) retires on April 19, 2026. Claude Sonnet 3.7 is also deprecated. If you're running production workloads on either, migrate now.

The upgrade path depends on your priorities. Haiku 3 users should move to Haiku 4.5 ($1/$5). Yes, it's 4x more expensive. But it's dramatically more capable: 64K max output (vs 4K), extended thinking support, and higher quality across every task type. If cost is the primary concern, Haiku 3.5 at $0.80/$4 is a middle ground that's not deprecated yet.

For Opus users, the migration is actually a price cut. Opus 4 and Opus 4.1 both cost $15/$75. Opus 4.6 costs $5/$25 and is more capable. There's no reason to stay on the older Opus models. Switch and save 67% immediately.

Sonnet users have the simplest migration. Sonnet 3.7 through Sonnet 4.6 all cost $3/$15. The pricing hasn't changed across four generations. You can upgrade to Sonnet 4.6 for better quality at the same price. The newer models also get the full 1M context window at standard pricing, while older Sonnets needed a beta header for anything over 200K tokens.

Rate Limits by Usage Tier

Anthropic uses a tiered system that increases your rate limits as your spending grows. The tiers determine how many requests per minute (RPM), input tokens per minute, and output tokens per minute you can use. Understanding these limits matters for production planning because hitting a rate limit means dropped requests.

Tier 1 starts at $0 minimum spend with basic limits. Each tier roughly doubles your limits. Tier 4 offers the highest standard limits. Enterprise customers can negotiate custom limits beyond Tier 4.

If you're building a production app that needs to handle traffic spikes, pay attention to the output tokens per minute limit. That's typically the bottleneck for chatbot-style applications where every request generates a substantial response. The input token limit matters more for batch processing and document ingestion workloads.

One strategy for managing rate limits without upgrading tiers: model routing. Send simple tasks to Haiku 4.5 (which has higher rate limits per dollar) and reserve Sonnet 4.6 or Opus 4.6 for complex tasks. This spreads your load across model-specific rate limit pools.

Will You Spend $20 or $2,000? How to Estimate Your Bill

Estimating API costs before you build is tricky because token usage varies by use case. Here's a framework that works for planning purposes.

Step 1: Estimate your average request size. A typical chatbot request is 1,000-3,000 input tokens (system prompt + user message + any context) and 300-1,000 output tokens. A document processing task might be 5,000-50,000 input tokens and 500-2,000 output tokens. A code generation task is typically 2,000-10,000 input tokens and 1,000-5,000 output tokens.

Step 2: Multiply by your daily request volume. 100 requests/day is a small app. 1,000/day is moderate. 10,000+/day is when you need to think seriously about optimization.

Step 3: Apply your model choice. Sonnet 4.6 at $3/$15 per MTok is the default. Switch to Haiku 4.5 for simple tasks, Opus 4.6 for complex ones.

Step 4: Apply discounts. Prompt caching typically saves 30-60% on input costs for apps with consistent system prompts. Batch API saves 50% on everything for async workloads. Combined, an optimized pipeline can run at 20-40% of naive pricing.

Step 5: Add a 20-30% buffer for extended thinking tokens, retries, and edge cases. Token usage is never as predictable as you hope.

A concrete example: a B2B SaaS app using Sonnet 4.6 for a customer-facing AI feature. 500 requests/day, 2,000 avg input tokens (1,500 cached system prompt + 500 fresh), 800 avg output tokens. Monthly: 15M cached input at $0.30/MTok ($4.50) + 7.5M fresh input at $3/MTok ($22.50) + 12M output at $15/MTok ($180). Total: ~$207/month. With a 25% buffer: ~$260/month.

Token Counting and How Anthropic Bills You

Claude uses byte-pair encoding (BPE) tokenization, similar to other large language models. One token roughly equals 3-4 characters of English text, or about 0.75 words. A 1,000-word document typically tokenizes to around 1,300-1,400 tokens.

System prompts count as input tokens on every request. If your system prompt is 500 tokens and you send 100 requests, you pay for 50,000 tokens of system prompt alone. This is where prompt caching becomes valuable, cache that system prompt and pay the write cost once, then 90% less on every subsequent read.

Images are tokenized based on their dimensions. A 1024x1024 image costs roughly 1,600 tokens. Larger images get scaled down automatically, but you can resize them yourself before sending to save on token costs. PDFs are processed page by page, with each page treated similarly to an image.

Tool definitions also count as input tokens. If you define 10 tools with detailed schemas, that could add 2,000-5,000 tokens per request. Keep tool descriptions concise and only include the tools your request actually needs.

The Messages API response includes a usage field that reports exact input and output token counts for every request. Use this to build dashboards and track costs in real time rather than estimating after the fact.

One common mistake: assuming token counts are symmetric between input and output. Output tokens cost 5x more than input tokens across all Claude models. A request that sends 10,000 input tokens and receives 2,000 output tokens costs the same as one that sends 10,000 input tokens and receives 2,000 output tokens, but the output portion accounts for roughly half the total cost despite being only 17% of the tokens. This asymmetry makes output token optimization (concise instructions, max_tokens limits, structured output formats) disproportionately valuable.

How to Cut Your Claude API Bill by 90%

The biggest cost lever is model selection. Most API users default to the most capable model, but 70-80% of typical production queries can be handled by Haiku 4.5 at $1/$5 per million tokens. Use Sonnet 4.6 for tasks requiring stronger reasoning, and reserve Opus 4.6 for your hardest problems. A tiered routing approach can cut costs by 50-70% compared to running everything through Opus.

Prompt caching is the second-biggest lever. If you send the same system prompt, few-shot examples, or document context across multiple requests, enable caching. The first request pays a 25% premium to write the cache, but every subsequent request within the 5-minute TTL pays 90% less for those cached tokens. For a chatbot with a 2,000-token system prompt handling 1,000 requests per hour, caching saves roughly $4.50/hour on Sonnet 4.6.

Batch API processing offers a flat 50% discount on all token costs. The tradeoff is latency, results return within 24 hours instead of seconds. For any workload that doesn't need real-time responses (data extraction, content generation, bulk classification, evaluation), batching should be your default.

Token-level optimizations matter at scale. Shorter system prompts, compressed few-shot examples, and max_tokens limits all reduce costs. Setting max_tokens to 500 instead of 4,096 won't affect quality for short-answer tasks but prevents runaway output on edge cases. Similarly, avoid sending full conversation history when only the last few turns matter, summarize older context instead.

Monitor usage patterns weekly. The API's usage field gives exact token counts per request. Aggregate these by model, endpoint, and use case to identify where you're overspending. Common findings: a debug prompt that accidentally stayed in production, a retry loop that doubles costs on timeouts, or a document processing pipeline sending the same 50,000-token document 10 times instead of caching it.

Strategy	Effort	Typical Savings	Best For
Model routing (Haiku/Sonnet/Opus)	Medium	50-70%	Production apps with mixed query complexity
Prompt caching	Low	30-50%	Repeated system prompts, RAG with shared context
Batch API	Low	50%	Offline processing, evaluations, data extraction
Token reduction (shorter prompts)	Low	10-25%	High-volume endpoints
Max tokens limits	Minimal	5-15%	Short-answer tasks, classification

5 Ways Claude Charges More Than You Expect

⚠ Output tokens cost 5x input tokens across every Claude model. A chatbot that generates long responses will spend most of its budget on output, not input. Track your output-to-input ratio.
⚠ Prompt caching slashes repeat input costs by 90%. A 5-minute cache write costs 1.25x base input, but every subsequent cache hit costs just 0.1x. If you send the same system prompt on every request, caching pays for itself after one read.
⚠ There's also a 1-hour cache at 2x base input. Worth it if your cached content stays stable across longer sessions. Pays off after two cache reads.
⚠ The Batch API gives 50% off both input and output tokens. The tradeoff: results are asynchronous (up to 24 hours). Good for data processing, evaluations, and any job that doesn't need real-time responses.
⚠ Extended thinking burns tokens you don't see in the response. The model uses internal reasoning tokens before producing output. Budget 2-5x your visible output for thinking-heavy tasks like multi-step coding or math.
⚠ US-only data residency (inference_geo parameter) adds a 10% surcharge on all token categories for Opus 4.6 and newer models. Global routing (the default) uses standard pricing.
⚠ Fast mode for Opus 4.6 costs 6x standard rates ($30/$150 per 1M tokens). You get significantly faster output but it's a research preview and can't be combined with batch processing.
⚠ Legacy models are still available but don't assume they're cheaper. Claude Opus 4.1 and Opus 4 cost $15/$75 per 1M tokens, which is 3x the price of the newer, more capable Opus 4.6 at $5/$25.

Which Claude Model Should You Use? Quick Decision Guide

High-volume classification, routing, or extraction

Haiku 4.5 at $1/$5 per 1M tokens. It's fast, supports extended thinking, and handles structured tasks well. For batch jobs, the price drops to $0.50/$2.50. A pipeline processing 10M tokens/day costs about $15/day with batch pricing.

Production apps: chatbots, coding assistants, document analysis

Sonnet 4.6 at $3/$15 per 1M tokens. The workhorse model. 1M context window, 64K max output, fast enough for real-time use. A typical SaaS feature processing 500 requests/day (averaging 2K input + 1K output tokens each) costs about $25/month.

Agents, complex multi-step coding, deep research

Opus 4.6 at $5/$25 per 1M tokens. The price gap to Sonnet has narrowed significantly (was 5x with Opus 3, now less than 2x). If your task requires extended reasoning or autonomous tool use, Opus 4.6 is worth the premium.

Cost-sensitive apps that need decent quality

Haiku 3.5 at $0.80/$4 per 1M tokens or legacy Haiku 3 at $0.25/$1.25 (deprecated April 2026). Haiku 3 is the cheapest Claude model ever but it's being retired. Migrate to Haiku 4.5 before April 19, 2026.

The Bottom Line

Sonnet 4.6 at $3/$15 per 1M tokens is the default choice for most developers. It's the same price as the old Sonnet 3.5 but significantly more capable, with a 1M context window and adaptive thinking. The big pricing story in 2026 is that Opus got cheaper: Opus 4.6 costs $5/$25 versus the old Opus 3's $15/$75. That's a 3x price cut for the flagship model. Stack prompt caching (90% off repeat inputs) and batch processing (50% off everything) to push costs even lower. A well-optimized production app using Sonnet 4.6 with caching typically spends $30-100/month for moderate traffic.

Disclosure: Pricing information is sourced from official websites and may change. We update this page regularly but always verify current pricing on the vendor's site before purchasing.

Related Resources

Anthropic Claude API Review → OpenAI API Pricing Comparison → OpenAI vs Anthropic API → AWS Bedrock Pricing → Cohere API Pricing → Compare All LLM Token Prices → AI API Free Tiers Compared →

Frequently Asked Questions

How much does the Anthropic Claude API cost in 2026?

Haiku 4.5: $1/$5 per 1M input/output tokens. Sonnet 4.6: $3/$15. Opus 4.6: $5/$25. Output tokens cost 5x input across all models. Batch processing cuts all prices by 50%.

Is Claude cheaper than GPT-4o in 2026?

Sonnet 4.6 ($3/$15) is more expensive than GPT-4o ($2.50/$10) on raw price. Haiku 4.5 ($1/$5) is more expensive than GPT-4o-mini ($0.15/$0.60). Claude wins on output quality for many coding and analysis tasks, so cost-per-useful-output can be lower despite higher per-token pricing.

What is prompt caching and how much does it save?

Prompt caching lets you reuse previously processed parts of your prompt across API calls. The first write costs 1.25x base input (5-minute cache) or 2x (1-hour cache). Every subsequent cache hit costs just 0.1x base input, which is a 90% discount. It pays for itself after one cache read on 5-minute caches.

What happened to Claude 3.5 Sonnet and Claude 3 Opus?

They're legacy models. Claude 3.5 Sonnet is now called Sonnet 3.7 (deprecated). Claude 3 Opus is deprecated. The current models are Opus 4.6, Sonnet 4.6, and Haiku 4.5. Notably, Opus 4.6 at $5/$25 is 3x cheaper than the old Opus 3 at $15/$75, while being more capable.

What's the cheapest way to use Claude's API?

Combine three things: (1) Use Haiku 4.5 for simple tasks and Sonnet 4.6 for complex ones (model routing). (2) Enable prompt caching for any repeated context like system prompts (90% savings on cached tokens). (3) Use the Batch API for non-real-time workloads (50% off everything). A pipeline using all three optimizations can run 80-90% cheaper than naive Opus 4.6 calls.

What are the context window sizes for Claude models in 2026?

Opus 4.6 and Sonnet 4.6 have 1M token context windows (about 750K words) at standard pricing. Haiku 4.5 has a 200K token window. Max output tokens: Opus 4.6 gets 128K, Sonnet 4.6 and Haiku 4.5 get 64K each.

What is extended thinking and does it cost extra?

Extended thinking lets Claude reason internally before responding. All current models support it (Opus 4.6, Sonnet 4.6, Haiku 4.5). The thinking tokens count as output tokens and are billed at output rates. Budget 2-5x your visible output for thinking-heavy tasks.

How does Claude's batch API pricing work?

The Batch API gives 50% off both input and output tokens. Opus 4.6 drops from $5/$25 to $2.50/$12.50. Sonnet 4.6 drops from $3/$15 to $1.50/$7.50. Haiku 4.5 drops from $1/$5 to $0.50/$2.50. Results are returned asynchronously within 24 hours.

What does Claude's fast mode cost?

Fast mode for Opus 4.6 (research preview) costs 6x standard rates: $30/1M input, $150/1M output. It provides significantly faster output. Prompt caching and data residency multipliers stack on top. Fast mode can't be used with the Batch API.

Are there volume discounts for Claude API usage?

Standard tier pricing is fixed. Enterprise customers can negotiate volume discounts and custom rate limits by contacting Anthropic's sales team. Rate limits increase across five usage tiers as your spending grows.