How many words is 1 million tokens?

Approximately 750,000 words for English text. The exact ratio varies by language and content type. Code typically produces more tokens per word due to syntax characters. A good rule of thumb: 1 token is about 4 characters or 0.75 words in English.

Why do output tokens cost more than input tokens?

Generating output tokens requires the model to run a sequential forward pass for each token, which is computationally expensive. Input tokens can be processed in parallel batches, which is more efficient. The output-to-input cost ratio ranges from 2.5x to 8x depending on the provider and model.

What's the cheapest way to use AI APIs in 2026?

For the lowest per-token cost, use Gemini 2.0 Flash ($0.10/1M input) or GPT-4.1 nano ($0.10/1M input). Combine with prompt caching (50-90% savings on repeated prompts), batch processing (50% discount on OpenAI), and output optimization. Most applications can reduce costs 60-80% through model selection and these optimizations.

How do I calculate my monthly LLM API cost?

Multiply your average input tokens per request by the input price per token, add output tokens per request times the output price per token, then multiply by your monthly request volume. Example: 1,000 input tokens + 500 output tokens on GPT-4.1 = (1000 * $0.000002) + (500 * $0.000008) = $0.006 per request. At 10,000 requests/month = $60.

Is per-1K or per-1M token pricing standard?

Per-1M tokens has become the industry standard as of 2025-2026. Older documentation and some providers still show per-1K pricing. To convert: divide per-1M pricing by 1,000. For example, $2.00 per 1M tokens = $0.002 per 1K tokens.

AI Token Pricing 2026 - Cost Per 1M Tokens for Every Major LLM

Every LLM provider quotes prices in tokens. Most developers have a rough sense that tokens are "kind of like words." That rough understanding leads to rough cost estimates, which lead to budget surprises.

This guide explains token pricing precisely: what tokens are, how much they cost across every major provider, and how to calculate your actual spend before you commit to a model.

Tokens: The Basics

A token is a chunk of text that the model processes as a single unit. For English text:

1 token is roughly 4 characters or about 0.75 words
1,000 tokens is roughly 750 words
1 million tokens is roughly 750,000 words (about 10 novels)
A typical email (200 words) is about 270 tokens
A 10-page report (3,000 words) is about 4,000 tokens

Code tokenizes differently than prose. Python code typically produces 1.5-2x more tokens per line than English text because of syntax characters, indentation, and variable names. JSON and XML are particularly token-hungry formats.

Per-1M-Token Pricing: Every Major Model

All prices current as of April 2026. Input tokens are what you send; output tokens are what the model generates.

Frontier Models (Highest Capability)

Model	Input / 1M Tokens	Output / 1M Tokens	Effective Ratio
Claude Opus 4	$15.00	$75.00	5.0x
o3 (OpenAI)	$2.00	$8.00	4.0x
GPT-4.1	$2.00	$8.00	4.0x
GPT-4o	$2.50	$10.00	4.0x
Gemini 2.5 Pro	$1.25	$10.00	8.0x

Mid-Tier Models (Best Price-Performance)

Model	Input / 1M Tokens	Output / 1M Tokens	Effective Ratio
Claude Sonnet 4	$3.00	$15.00	5.0x
Mistral Large 2	$2.00	$6.00	3.0x
Cohere Command R+	$2.50	$10.00	4.0x
o4-mini (OpenAI)	$1.10	$4.40	4.0x

Budget Models (High Volume / Low Cost)

Model	Input / 1M Tokens	Output / 1M Tokens	Effective Ratio
GPT-4.1 mini	$0.40	$1.60	4.0x
Claude Haiku 3.5	$0.80	$4.00	5.0x
Gemini 2.5 Flash	$0.15	$0.60	4.0x
Gemini 2.0 Flash	$0.10	$0.40	4.0x
GPT-4.1 nano	$0.10	$0.40	4.0x
Mistral Small	$0.10	$0.30	3.0x
Cohere Command R	$0.15	$0.60	4.0x
Llama 4 Maverick (hosted)	$0.20	$0.60	3.0x
Llama 4 Scout (hosted)	$0.10	$0.25	2.5x

Why Output Tokens Cost More

Every provider charges more for output tokens than input tokens. The ratio ranges from 2.5x (Llama 4 Scout) to 8x (Gemini 2.5 Pro). This isn't arbitrary pricing. Generating output requires the model to run its full forward pass for each token sequentially, while input tokens can be processed in parallel batches.

This asymmetry has a practical implication: controlling output length is the single most effective cost optimization. A prompt that generates 500 tokens of output costs 2-5x less than one that generates 2,000 tokens, even with identical input.

Real-World Cost Calculations

Abstract per-million pricing is hard to reason about. Here are concrete examples.

Example 1: Email Classification System

You're classifying incoming customer emails by topic and priority.

System prompt: 500 tokens (fixed per request)
Average email: 270 tokens
Output (classification + reasoning): 50 tokens
Volume: 5,000 emails/day

Model	Daily Cost	Monthly Cost
GPT-4.1	$9.70	$291
GPT-4.1 mini	$1.94	$58
GPT-4.1 nano	$0.49	$15
Gemini 2.5 Flash	$0.73	$22
Claude Haiku 3.5	$4.08	$122

For a simple classification task, the difference between the cheapest and most expensive reasonable option is 20x. If classification accuracy is similar across models (test this with your data), there's no reason to pay $291/month when $15/month works.

Example 2: Content Generation Platform

You're generating blog post drafts from outlines.

System prompt: 1,000 tokens
User input (outline + instructions): 500 tokens
Output (draft article): 4,000 tokens (~3,000 words)
Volume: 200 articles/day

Model	Daily Cost	Monthly Cost
GPT-4.1	$7.00	$210
Claude Sonnet 4	$12.90	$387
Gemini 2.5 Pro	$8.38	$251
GPT-4.1 mini	$1.40	$42
Gemini 2.5 Flash	$0.53	$16

Content generation is output-heavy, so the output token price dominates. Claude Sonnet's $15/1M output pricing makes it expensive for this workload despite competitive input pricing.

Example 3: RAG-Based Q&A System

Knowledge base chatbot retrieving context from documents.

System prompt: 800 tokens
Retrieved context: 3,000 tokens (average 4 chunks)
User question: 50 tokens
Answer: 300 tokens
Volume: 2,000 queries/day

Model	Daily Cost	Monthly Cost
GPT-4.1	$20.20	$606
Claude Sonnet 4	$32.10	$963
GPT-4.1 mini	$4.04	$121
Gemini 2.5 Flash	$1.52	$46
Llama 4 Maverick (hosted)	$1.90	$57

RAG systems are input-heavy (large context windows), so input token pricing matters more. Prompt caching can dramatically reduce costs here since the system prompt repeats every request.

Per-1K Tokens vs. Per-1M Tokens

Some documentation quotes prices per 1K tokens, others per 1M tokens. The industry has largely standardized on per-1M-token pricing, but you'll still encounter per-1K pricing in older docs.

Conversion: divide the per-1M price by 1,000 to get per-1K pricing. GPT-4.1 at $2.00/1M input = $0.002/1K input.

Hidden Costs to Account For

Reasoning Token Overhead

OpenAI's o3 and o4-mini models generate internal reasoning tokens that you're billed for as output tokens but never see. A simple question might generate 500 visible output tokens but consume 3,000 reasoning tokens internally. Your actual cost is 7x what you'd estimate from the visible output alone.

Retry and Error Costs

Rate limits, timeouts, and malformed outputs mean you'll retry some percentage of requests. Budget for 5-15% overhead depending on your error handling logic and the provider's reliability.

Embedding Costs

If you're building RAG systems, you also pay for embedding your documents. OpenAI's text-embedding-3-small costs $0.02/1M tokens. See our Best Embedding Models 2026 guide for a full comparison.

Fine-Tuning Costs

Training tokens for fine-tuning cost 2-8x more than inference tokens. OpenAI charges $25.00/1M training tokens for GPT-4.1 mini. Factor this into your total cost of ownership if you plan to fine-tune.

Cost Optimization Checklist

Apply these in order of impact:

Right-size your model. Test cheaper models first. You might not need GPT-4.1 when GPT-4.1 mini or Gemini Flash produces acceptable results for your task.
Minimize output tokens. Request JSON instead of prose. Set max_tokens. Use structured output schemas to prevent verbose generation.
Enable prompt caching. Both OpenAI and Anthropic cache system prompts automatically. Make sure your system prompt is stable to benefit from caching.
Batch when possible. OpenAI's Batch API is 50% cheaper. Anthropic offers similar discounts for batched requests.
Compress context. Use RAG to retrieve only relevant sections. Summarize long contexts before passing them to the model.
Monitor and alert. Set spending alerts in your provider dashboard. Review token usage weekly to catch inefficiencies early.

For a comprehensive comparison of all model pricing, see our LLM Pricing Comparison (April 2026) guide.

About the Author

Rome Thorndike is the founder of the Prompt Engineer Collective, a community of over 1,300 prompt engineering professionals, and author of The AI News Digest, a weekly newsletter with 2,700+ subscribers. Rome brings hands-on AI/ML experience from Microsoft, where he worked with Dynamics and Azure AI/ML solutions, and later led sales at Datajoy (acquired by Databricks).