LLM pricing changes fast. New models launch, old ones get price cuts, and the per-token math can make or break your AI project's economics. This page tracks current API pricing for every major model as of April 2026, with real cost comparisons for common workloads.
Bookmark this page. We update it monthly.
Quick Pricing Reference: All Models at a Glance
This table covers the primary API-accessible models from each provider. Prices are per 1 million tokens unless noted otherwise.
| Model | Provider | Input (per 1M tokens) | Output (per 1M tokens) | Context Window |
|---|---|---|---|---|
| GPT-4.1 | OpenAI | $2.00 | $8.00 | 1M |
| GPT-4.1 mini | OpenAI | $0.40 | $1.60 | 1M |
| GPT-4.1 nano | OpenAI | $0.10 | $0.40 | 1M |
| GPT-4o | OpenAI | $2.50 | $10.00 | 128K |
| o3 | OpenAI | $2.00 | $8.00 | 200K |
| o3-mini | OpenAI | $1.10 | $4.40 | 200K |
| o4-mini | OpenAI | $1.10 | $4.40 | 200K |
| Claude Opus 4 | Anthropic | $15.00 | $75.00 | 200K |
| Claude Sonnet 4 | Anthropic | $3.00 | $15.00 | 200K |
| Claude Haiku 3.5 | Anthropic | $0.80 | $4.00 | 200K |
| Gemini 2.5 Pro | $1.25 | $10.00 | 1M | |
| Gemini 2.5 Flash | $0.15 | $0.60 | 1M | |
| Gemini 2.0 Flash | $0.10 | $0.40 | 1M | |
| Llama 4 Maverick | Meta (via providers) | $0.20 | $0.60 | 1M |
| Llama 4 Scout | Meta (via providers) | $0.10 | $0.25 | 10M |
| Mistral Large 2 | Mistral | $2.00 | $6.00 | 128K |
| Mistral Small | Mistral | $0.10 | $0.30 | 32K |
| Cohere Command R+ | Cohere | $2.50 | $10.00 | 128K |
| Cohere Command R | Cohere | $0.15 | $0.60 | 128K |
Prices reflect standard API rates. Volume discounts, committed-use agreements, and batch processing can reduce costs by 25-50% depending on the provider.
OpenAI Pricing Breakdown
OpenAI runs the largest model portfolio in the market. The April 2026 lineup spans from nano-class models at $0.10/1M input tokens up to the full o3 reasoning model. For most production workloads, the GPT-4.1 family replaced GPT-4o as the default recommendation.
GPT-4.1 Family
GPT-4.1 is OpenAI's workhorse. It handles coding, analysis, and long-context tasks with a 1M token context window. The mini variant cuts cost by 80% with surprisingly small quality tradeoffs on structured tasks. The nano variant is built for high-volume, latency-sensitive workloads where you need sub-100ms responses.
Cost example: processing 10,000 customer support tickets (average 500 tokens input, 200 tokens output each) costs roughly $16 with GPT-4.1, $3.20 with GPT-4.1 mini, and $0.80 with GPT-4.1 nano.
Reasoning Models (o3, o4-mini)
OpenAI's reasoning models think before answering. They consume more tokens internally (chain-of-thought tokens are billed as output), which means actual costs run 2-5x higher than the per-token price suggests. Use these for complex analysis, math, and multi-step reasoning. Not cost-effective for simple classification or extraction tasks.
For detailed OpenAI pricing tiers and rate limits, see our OpenAI API Pricing page.
Anthropic Pricing Breakdown
Anthropic prices on a three-tier system: Haiku (fast and cheap), Sonnet (balanced), and Opus (maximum capability). The gap between tiers is significant. Opus costs 5x more than Sonnet on input and 5x more on output.
When Opus Is Worth the Premium
Claude Opus 4 is the most expensive mainstream LLM at $15/$75 per million tokens. That price point only makes sense for tasks where quality differences directly impact revenue: legal document analysis, complex code generation, research synthesis, and agentic workflows where errors cascade. For most applications, Sonnet 4 handles the work at 80% of the quality for 20% of the cost.
Haiku: The Budget Workhorse
Claude Haiku 3.5 at $0.80/$4.00 fills the high-quality budget slot. It outperforms GPT-4o mini on many benchmarks while costing roughly double. The tradeoff is worth it when you need Anthropic's safety characteristics or superior instruction following on nuanced tasks.
Full Anthropic tier comparison at our Anthropic API Pricing page.
Google Gemini Pricing Breakdown
Google's pricing strategy is aggressive. Gemini 2.5 Flash at $0.15/$0.60 per million tokens undercuts nearly everything except open-source models, and it includes a 1M token context window. The free tier (see our AI Free Tiers 2026 guide) is generous enough for prototyping and low-volume production use.
Gemini 2.5 Pro
At $1.25/$10.00, Gemini 2.5 Pro offers strong reasoning and coding performance. The input pricing undercuts Claude Sonnet and GPT-4.1, but output tokens are priced at $10 per million, making generation-heavy workloads expensive. Use Gemini Pro when your prompts have high input-to-output ratios (document analysis, summarization of long texts).
Flash Models
Gemini 2.5 Flash and 2.0 Flash are the price-performance leaders. At $0.10-$0.15 per million input tokens, they compete directly with open-source model hosting costs while requiring zero infrastructure management.
Open-Source Model Costs
Llama 4, Mistral, and other open-weight models don't have a single price. Your cost depends on how you host them.
Hosted API Pricing
Providers like Together AI, Fireworks, Groq, and AWS Bedrock host open-source models and charge per token. Typical rates for Llama 4 Maverick range from $0.15-$0.30 per million input tokens depending on the provider and commitment level. Self-hosting on your own GPUs can be cheaper at scale but requires significant DevOps investment.
Self-Hosting Economics
Running Llama 4 Maverick (400B+ parameters, mixture-of-experts) requires multiple high-end GPUs. A typical setup costs $3-8/hour on cloud GPU instances. At sustained high throughput (100+ requests/minute), self-hosting breaks even with API pricing around the 50,000 requests/day mark. Below that, hosted APIs are cheaper.
Cost Comparison by Workload
Raw per-token pricing tells part of the story. Actual costs depend on your workload pattern.
Chatbot / Conversational AI
Average conversation: 2,000 tokens input (system prompt + history), 500 tokens output per turn, 5 turns per session.
| Model | Cost per Session | Cost per 10K Sessions/Month |
|---|---|---|
| GPT-4.1 | $0.06 | $600 |
| GPT-4.1 mini | $0.012 | $120 |
| Claude Sonnet 4 | $0.068 | $675 |
| Gemini 2.5 Flash | $0.005 | $45 |
| Llama 4 Maverick (hosted) | $0.005 | $55 |
Document Processing Pipeline
Average document: 8,000 tokens input, 1,000 tokens output (summary + extraction).
| Model | Cost per Document | Cost per 50K Docs/Month |
|---|---|---|
| GPT-4.1 | $0.024 | $1,200 |
| GPT-4.1 nano | $0.001 | $60 |
| Claude Haiku 3.5 | $0.010 | $520 |
| Gemini 2.5 Flash | $0.002 | $90 |
| Gemini 2.0 Flash | $0.001 | $60 |
Code Generation / Analysis
Average request: 3,000 tokens input (code + instructions), 2,000 tokens output.
| Model | Cost per Request | Cost per 100K Requests/Month |
|---|---|---|
| GPT-4.1 | $0.022 | $2,200 |
| Claude Sonnet 4 | $0.039 | $3,900 |
| Claude Opus 4 | $0.195 | $19,500 |
| Gemini 2.5 Pro | $0.024 | $2,375 |
| Mistral Large 2 | $0.018 | $1,800 |
Cost Optimization Strategies
The cheapest model isn't always the best value. Here's how to optimize spend without sacrificing quality.
1. Tiered Model Routing
Route requests to different models based on complexity. Use a cheap classifier (GPT-4.1 nano or Gemini 2.0 Flash) to assess request difficulty, then route simple requests to budget models and complex ones to premium models. This typically cuts costs 40-60% compared to using a single model for everything.
2. Prompt Caching
Both OpenAI and Anthropic offer prompt caching for system prompts and repeated context. Cached input tokens cost 50-90% less than fresh tokens. If your system prompt is 2,000+ tokens, caching pays for itself immediately. Anthropic's prompt caching reduces cached input to $0.30/1M on Sonnet (90% discount).
3. Batch Processing
OpenAI's Batch API charges 50% less for non-real-time workloads. If your use case can tolerate 24-hour turnaround (nightly report generation, weekly analysis runs), batch processing is the simplest cost reduction available.
4. Context Window Management
Stuffing the full context window costs money. A 100K token input to Claude Sonnet costs $0.30 per request. Trim your context to what's actually needed. Use RAG to retrieve only relevant chunks instead of passing entire documents.
5. Output Token Optimization
Output tokens cost 2-5x more than input tokens across all providers. Request concise outputs. Use structured output formats (JSON) to avoid verbose prose. Set max_tokens limits to prevent runaway generation.
Provider Comparison: Beyond Price
Price alone doesn't determine value. Consider these factors when choosing a provider.
Rate Limits and Availability
OpenAI offers the highest default rate limits. Anthropic's rate limits are more restrictive but increase with usage tier. Google provides generous free-tier limits but can throttle aggressively during peak hours. For production workloads, check each provider's rate limit documentation and request increases proactively.
Quality vs. Cost Tradeoff
The best model for your use case isn't always the most expensive one. Run evaluations on your specific tasks. We've seen cases where GPT-4.1 mini outperforms GPT-4.1 on well-structured extraction tasks, and where Claude Haiku beats Gemini Pro on nuanced classification. Price per token matters less than price per correct output.
Ecosystem and Tooling
OpenAI has the largest ecosystem: fine-tuning, assistants API, function calling, and extensive third-party integrations. Anthropic offers the best developer experience for complex system prompts and tool use. Google integrates tightly with GCP services (Vertex AI, BigQuery). Your existing infrastructure should influence your choice.
Pricing Trends: Where Costs Are Heading
LLM API prices have dropped roughly 10x over the past two years. GPT-4-class performance that cost $30/1M input tokens in early 2024 now costs $2-3/1M. Several forces continue pushing prices down.
- Hardware improvements: New GPU architectures (NVIDIA Blackwell, AMD MI350) increase inference throughput, reducing per-token serving costs.
- Model efficiency: Mixture-of-experts architectures (used by Llama 4, Gemini, Mistral) activate only a fraction of total parameters per request, cutting compute costs.
- Competition: Google, Meta, and Mistral are subsidizing pricing to gain market share. This benefits buyers.
- Open-source pressure: Free open-weight models set a floor on how much providers can charge for closed models of similar quality.
Expect another 2-3x price reduction over the next 12 months for equivalent quality levels.
Which Model Should You Choose?
Here's the decision framework we recommend.
- Lowest possible cost, acceptable quality: Gemini 2.0 Flash or GPT-4.1 nano
- Best price-performance balance: GPT-4.1 mini or Gemini 2.5 Flash
- Production quality, reasonable cost: GPT-4.1 or Claude Sonnet 4
- Maximum quality, cost secondary: Claude Opus 4 or o3
- High volume, cost-sensitive: Llama 4 Maverick (self-hosted) or Cohere Command R
- Privacy/compliance requirements: Self-hosted Llama 4 or Mistral via VPC deployment
For provider-specific deep dives, see our OpenAI pricing guide, Anthropic pricing guide, and Cohere pricing guide.