What is the cheapest LLM API in April 2026?

Google's Gemini 2.0 Flash and OpenAI's GPT-4.1 nano are the cheapest mainstream options at $0.10 per million input tokens. For open-source alternatives, Llama 4 Scout via hosted providers starts around $0.10/1M input tokens. Self-hosted open-source models can be even cheaper at high volumes.

How much does it cost to run a chatbot on GPT-4.1?

A typical chatbot session (5 turns, 2,000 token input and 500 token output per turn) costs about $0.06 on GPT-4.1. At 10,000 sessions per month, that's roughly $600. Using GPT-4.1 mini drops the cost to $120/month with minimal quality loss for most conversational use cases.

Is Claude Opus 4 worth the higher price?

Claude Opus 4 costs 5x more than Claude Sonnet 4. It's worth the premium for complex reasoning, legal document analysis, advanced code generation, and agentic workflows where errors are expensive. For standard chatbot, classification, and extraction tasks, Sonnet 4 delivers 80% of the quality at 20% of the cost.

How do LLM API prices compare to self-hosting?

Self-hosting breaks even with API pricing at roughly 50,000+ requests per day for large models like Llama 4 Maverick. Below that threshold, hosted APIs are cheaper because you avoid GPU rental and DevOps overhead. The calculation shifts if you have existing GPU infrastructure or strict data privacy requirements.

How can I reduce my LLM API costs?

Five proven strategies: (1) Route simple requests to cheaper models using a classifier. (2) Enable prompt caching for repeated system prompts (50-90% savings on cached tokens). (3) Use batch APIs for non-real-time workloads (50% discount on OpenAI). (4) Trim context to only what's needed using RAG instead of full documents. (5) Request concise outputs and set max_tokens limits.

LLM API Pricing Comparison (April 2026) - Every Model, Side by Side

Q: What's the difference between input and output token pricing?

Input tokens are what you send to the model (your prompt, system instructions, context). Output tokens are what the model generates in response. Output tokens cost 2-5x more than input tokens across all providers because generation requires more compute than processing input. Optimizing output length is the fastest way to reduce costs.

LLM pricing changes fast. New models launch, old ones get price cuts, and the per-token math can make or break your AI project's economics. This page tracks current API pricing for every major model as of April 2026, with real cost comparisons for common workloads.

Bookmark this page. We update it monthly.

Quick Pricing Reference: All Models at a Glance

This table covers the primary API-accessible models from each provider. Prices are per 1 million tokens unless noted otherwise.

Model	Provider	Input (per 1M tokens)	Output (per 1M tokens)	Context Window
GPT-4.1	OpenAI	$2.00	$8.00	1M
GPT-4.1 mini	OpenAI	$0.40	$1.60	1M
GPT-4.1 nano	OpenAI	$0.10	$0.40	1M
GPT-4o	OpenAI	$2.50	$10.00	128K
o3	OpenAI	$2.00	$8.00	200K
o3-mini	OpenAI	$1.10	$4.40	200K
o4-mini	OpenAI	$1.10	$4.40	200K
Claude Opus 4	Anthropic	$15.00	$75.00	200K
Claude Sonnet 4	Anthropic	$3.00	$15.00	200K
Claude Haiku 3.5	Anthropic	$0.80	$4.00	200K
Gemini 2.5 Pro	Google	$1.25	$10.00	1M
Gemini 2.5 Flash	Google	$0.15	$0.60	1M
Gemini 2.0 Flash	Google	$0.10	$0.40	1M
Llama 4 Maverick	Meta (via providers)	$0.20	$0.60	1M
Llama 4 Scout	Meta (via providers)	$0.10	$0.25	10M
Mistral Large 2	Mistral	$2.00	$6.00	128K
Mistral Small	Mistral	$0.10	$0.30	32K
Cohere Command R+	Cohere	$2.50	$10.00	128K
Cohere Command R	Cohere	$0.15	$0.60	128K

Prices reflect standard API rates. Volume discounts, committed-use agreements, and batch processing can reduce costs by 25-50% depending on the provider.

OpenAI Pricing Breakdown

OpenAI runs the largest model portfolio in the market. The April 2026 lineup spans from nano-class models at $0.10/1M input tokens up to the full o3 reasoning model. For most production workloads, the GPT-4.1 family replaced GPT-4o as the default recommendation.

GPT-4.1 Family

GPT-4.1 is OpenAI's workhorse. It handles coding, analysis, and long-context tasks with a 1M token context window. The mini variant cuts cost by 80% with surprisingly small quality tradeoffs on structured tasks. The nano variant is built for high-volume, latency-sensitive workloads where you need sub-100ms responses.

Cost example: processing 10,000 customer support tickets (average 500 tokens input, 200 tokens output each) costs roughly $16 with GPT-4.1, $3.20 with GPT-4.1 mini, and $0.80 with GPT-4.1 nano.

Reasoning Models (o3, o4-mini)

OpenAI's reasoning models think before answering. They consume more tokens internally (chain-of-thought tokens are billed as output), which means actual costs run 2-5x higher than the per-token price suggests. Use these for complex analysis, math, and multi-step reasoning. Not cost-effective for simple classification or extraction tasks.

For detailed OpenAI pricing tiers and rate limits, see our OpenAI API Pricing page.

Anthropic Pricing Breakdown

Anthropic prices on a three-tier system: Haiku (fast and cheap), Sonnet (balanced), and Opus (maximum capability). The gap between tiers is significant. Opus costs 5x more than Sonnet on input and 5x more on output.

When Opus Is Worth the Premium

Claude Opus 4 is the most expensive mainstream LLM at $15/$75 per million tokens. That price point only makes sense for tasks where quality differences directly impact revenue: legal document analysis, complex code generation, research synthesis, and agentic workflows where errors cascade. For most applications, Sonnet 4 handles the work at 80% of the quality for 20% of the cost.

Haiku: The Budget Workhorse

Claude Haiku 3.5 at $0.80/$4.00 fills the high-quality budget slot. It outperforms GPT-4o mini on many benchmarks while costing roughly double. The tradeoff is worth it when you need Anthropic's safety characteristics or superior instruction following on nuanced tasks.

Full Anthropic tier comparison at our Anthropic API Pricing page.

Google Gemini Pricing Breakdown

Google's pricing strategy is aggressive. Gemini 2.5 Flash at $0.15/$0.60 per million tokens undercuts nearly everything except open-source models, and it includes a 1M token context window. The free tier (see our AI Free Tiers 2026 guide) is generous enough for prototyping and low-volume production use.

Gemini 2.5 Pro

At $1.25/$10.00, Gemini 2.5 Pro offers strong reasoning and coding performance. The input pricing undercuts Claude Sonnet and GPT-4.1, but output tokens are priced at $10 per million, making generation-heavy workloads expensive. Use Gemini Pro when your prompts have high input-to-output ratios (document analysis, summarization of long texts).

Flash Models

Gemini 2.5 Flash and 2.0 Flash are the price-performance leaders. At $0.10-$0.15 per million input tokens, they compete directly with open-source model hosting costs while requiring zero infrastructure management.

Open-Source Model Costs

Llama 4, Mistral, and other open-weight models don't have a single price. Your cost depends on how you host them.

Hosted API Pricing

Providers like Together AI, Fireworks, Groq, and AWS Bedrock host open-source models and charge per token. Typical rates for Llama 4 Maverick range from $0.15-$0.30 per million input tokens depending on the provider and commitment level. Self-hosting on your own GPUs can be cheaper at scale but requires significant DevOps investment.

Self-Hosting Economics

Running Llama 4 Maverick (400B+ parameters, mixture-of-experts) requires multiple high-end GPUs. A typical setup costs $3-8/hour on cloud GPU instances. At sustained high throughput (100+ requests/minute), self-hosting breaks even with API pricing around the 50,000 requests/day mark. Below that, hosted APIs are cheaper.

Cost Comparison by Workload

Raw per-token pricing tells part of the story. Actual costs depend on your workload pattern.

Chatbot / Conversational AI

Average conversation: 2,000 tokens input (system prompt + history), 500 tokens output per turn, 5 turns per session.

Model	Cost per Session	Cost per 10K Sessions/Month
GPT-4.1	$0.06	$600
GPT-4.1 mini	$0.012	$120
Claude Sonnet 4	$0.068	$675
Gemini 2.5 Flash	$0.005	$45
Llama 4 Maverick (hosted)	$0.005	$55

Document Processing Pipeline

Average document: 8,000 tokens input, 1,000 tokens output (summary + extraction).

Model	Cost per Document	Cost per 50K Docs/Month
GPT-4.1	$0.024	$1,200
GPT-4.1 nano	$0.001	$60
Claude Haiku 3.5	$0.010	$520
Gemini 2.5 Flash	$0.002	$90
Gemini 2.0 Flash	$0.001	$60

Code Generation / Analysis

Average request: 3,000 tokens input (code + instructions), 2,000 tokens output.

Model	Cost per Request	Cost per 100K Requests/Month
GPT-4.1	$0.022	$2,200
Claude Sonnet 4	$0.039	$3,900
Claude Opus 4	$0.195	$19,500
Gemini 2.5 Pro	$0.024	$2,375
Mistral Large 2	$0.018	$1,800

Cost Optimization Strategies

The cheapest model isn't always the best value. Here's how to optimize spend without sacrificing quality.

1. Tiered Model Routing

Route requests to different models based on complexity. Use a cheap classifier (GPT-4.1 nano or Gemini 2.0 Flash) to assess request difficulty, then route simple requests to budget models and complex ones to premium models. This typically cuts costs 40-60% compared to using a single model for everything.

2. Prompt Caching

Both OpenAI and Anthropic offer prompt caching for system prompts and repeated context. Cached input tokens cost 50-90% less than fresh tokens. If your system prompt is 2,000+ tokens, caching pays for itself immediately. Anthropic's prompt caching reduces cached input to $0.30/1M on Sonnet (90% discount).

3. Batch Processing

OpenAI's Batch API charges 50% less for non-real-time workloads. If your use case can tolerate 24-hour turnaround (nightly report generation, weekly analysis runs), batch processing is the simplest cost reduction available.

4. Context Window Management

Stuffing the full context window costs money. A 100K token input to Claude Sonnet costs $0.30 per request. Trim your context to what's actually needed. Use RAG to retrieve only relevant chunks instead of passing entire documents.

5. Output Token Optimization

Output tokens cost 2-5x more than input tokens across all providers. Request concise outputs. Use structured output formats (JSON) to avoid verbose prose. Set max_tokens limits to prevent runaway generation.

Provider Comparison: Beyond Price

Price alone doesn't determine value. Consider these factors when choosing a provider.

Rate Limits and Availability

OpenAI offers the highest default rate limits. Anthropic's rate limits are more restrictive but increase with usage tier. Google provides generous free-tier limits but can throttle aggressively during peak hours. For production workloads, check each provider's rate limit documentation and request increases proactively.

Quality vs. Cost Tradeoff

The best model for your use case isn't always the most expensive one. Run evaluations on your specific tasks. We've seen cases where GPT-4.1 mini outperforms GPT-4.1 on well-structured extraction tasks, and where Claude Haiku beats Gemini Pro on nuanced classification. Price per token matters less than price per correct output.

Ecosystem and Tooling

OpenAI has the largest ecosystem: fine-tuning, assistants API, function calling, and extensive third-party integrations. Anthropic offers the best developer experience for complex system prompts and tool use. Google integrates tightly with GCP services (Vertex AI, BigQuery). Your existing infrastructure should influence your choice.

Pricing Trends: Where Costs Are Heading

LLM API prices have dropped roughly 10x over the past two years. GPT-4-class performance that cost $30/1M input tokens in early 2024 now costs $2-3/1M. Several forces continue pushing prices down.

Hardware improvements: New GPU architectures (NVIDIA Blackwell, AMD MI350) increase inference throughput, reducing per-token serving costs.
Model efficiency: Mixture-of-experts architectures (used by Llama 4, Gemini, Mistral) activate only a fraction of total parameters per request, cutting compute costs.
Competition: Google, Meta, and Mistral are subsidizing pricing to gain market share. This benefits buyers.
Open-source pressure: Free open-weight models set a floor on how much providers can charge for closed models of similar quality.

Expect another 2-3x price reduction over the next 12 months for equivalent quality levels.

Which Model Should You Choose?

Here's the decision framework we recommend.

Lowest possible cost, acceptable quality: Gemini 2.0 Flash or GPT-4.1 nano
Best price-performance balance: GPT-4.1 mini or Gemini 2.5 Flash
Production quality, reasonable cost: GPT-4.1 or Claude Sonnet 4
Maximum quality, cost secondary: Claude Opus 4 or o3
High volume, cost-sensitive: Llama 4 Maverick (self-hosted) or Cohere Command R
Privacy/compliance requirements: Self-hosted Llama 4 or Mistral via VPC deployment

For provider-specific deep dives, see our OpenAI pricing guide, Anthropic pricing guide, and Cohere pricing guide.

About the Author

Rome Thorndike is the founder of the Prompt Engineer Collective, a community of over 1,300 prompt engineering professionals, and author of The AI News Digest, a weekly newsletter with 2,700+ subscribers. Rome brings hands-on AI/ML experience from Microsoft, where he worked with Dynamics and Azure AI/ML solutions, and later led sales at Datajoy (acquired by Databricks).