OpenAI API Pricing: What Every Model Actually Costs (April 2026)

OpenAI's model lineup has expanded significantly. GPT-4.1 replaced GPT-4o as the recommended production model. GPT-5 is the new flagship. The o-series reasoning models now include o3 and o4-mini. And the GPT-4.1 Nano at $0.10 per million input tokens is one of the cheapest capable models available anywhere. This page breaks down every current model's pricing, the batch and caching discounts that cut costs by 50-90%, and real cost math for production workloads. All prices verified April 2026.

OpenAI API pricing comparison chart showing plan tiers and costs
OpenAI API pricing tiers as of April 2026. Data verified by PE Collective.

GPT-4.1 Nano

$0.10 / $0.40 per 1M input / output tokens
  • Cheapest capable model in OpenAI's lineup
  • Good for classification, extraction, routing
  • 1M token context window
  • Fast response times
  • Best cost-per-token ratio for simple tasks
Most Popular

GPT-4.1

$2.00 / $8.00 per 1M input / output tokens
  • Recommended production model (replaced GPT-4o)
  • Strong coding, instruction following, long context
  • 1M token context window
  • Function calling and structured outputs
  • Best balance of cost, speed, and quality

o4-mini

$1.10 / $4.40 per 1M input / output tokens
  • Best-value reasoning model
  • Chain-of-thought reasoning built in
  • 200K context window
  • Outperforms o3-mini on most benchmarks
  • Good for math, science, and complex logic

GPT-5

$1.25 / $10.00 per 1M input / output tokens
  • Most capable model overall
  • Agentic workflows and complex reasoning
  • 128K context window
  • Best for research and hard problems
  • 90% cache read discount

Complete Model Pricing Table (April 2026)

OpenAI now offers more than a dozen models across three families: GPT (general-purpose), o-series (reasoning), and specialized models. Here's the full breakdown of every current model worth knowing about.

ModelInput / 1M tokensOutput / 1M tokensCached InputContext WindowBest For
GPT-4.1 Nano$0.10$0.40$0.0251MClassification, routing, extraction
GPT-4o-mini$0.15$0.60$0.075128KLegacy simple tasks
GPT-5 Mini$0.25$2.00$0.025128KBudget general-purpose
GPT-4.1 Mini$0.40$1.60$0.101MMid-tier production tasks
o4-mini$1.10$4.40$0.275200KBudget reasoning
o3-mini$1.10$4.40$0.55200KLegacy reasoning
GPT-5$1.25$10.00$0.125128KFlagship, agentic workflows
GPT-4.1$2.00$8.00$0.501MProduction workhorse
o3$2.00$8.00$0.50200KAdvanced reasoning
GPT-4o$2.50$10.00$1.25128KLegacy production
o1$15.00$60.00$7.50200KLegacy deep reasoning

GPT-4.1 Family: The New Production Standard

GPT-4.1 launched as OpenAI's recommended replacement for GPT-4o. It's cheaper ($2/$8 vs $2.50/$10), has a 1 million token context window (vs 128K), and scores better on instruction-following and coding benchmarks. If your code still references gpt-4o, switching to gpt-4.1 saves money while improving quality.

The GPT-4.1 family includes three variants. GPT-4.1 ($2/$8) is the main production model, strong at coding, long-context processing, and structured outputs. GPT-4.1 Mini ($0.40/$1.60) handles mid-tier tasks at 5x lower cost. GPT-4.1 Nano ($0.10/$0.40) is the budget option, ideal for classification and extraction where you don't need sophisticated reasoning.

All three variants share the 1 million token context window. This is significant for document processing and RAG applications that previously needed to chunk documents into 128K segments. With GPT-4.1, you can process entire codebases, legal documents, or research papers in a single request.

Cache read pricing for the GPT-4.1 family is 75% off standard input pricing. GPT-4.1's cached input rate is $0.50 per million tokens (vs $2.00 standard). For applications that send the same system prompt, few-shot examples, or document context across requests, this stacks with the already-low base price to make GPT-4.1 very cost-effective.

Is GPT-5 Worth 5x the Price of GPT-4.1?

GPT-5 is OpenAI's most capable general-purpose model. At $1.25 per million input tokens, it's actually cheaper than GPT-4.1 on input, but output tokens cost $10 per million (vs $8 for GPT-4.1). The higher output cost is where GPT-5 gets expensive, especially for tasks that generate long responses.

Where GPT-5 shines is agentic workflows, complex multi-step reasoning, and tasks that require broad world knowledge. It also gets the best cache discount in OpenAI's lineup: 90% off cached input reads, bringing the effective cached input rate to $0.125 per million tokens.

GPT-5 Mini ($0.25/$2.00) offers a budget path into GPT-5's capabilities. It's positioned between GPT-4.1 Nano and GPT-4.1 Mini in price but uses GPT-5's architecture. For applications that need GPT-5-class reasoning on a budget, this is worth testing.

The practical question: when should you use GPT-5 vs GPT-4.1? GPT-4.1 wins on coding tasks, instruction following, and long-context processing. GPT-5 wins on complex reasoning, agentic tool use, and tasks requiring nuanced judgment. Most production apps should default to GPT-4.1 and only route to GPT-5 for their hardest queries.

o-Series Reasoning Models: o3 and o4-mini

OpenAI's o-series models use chain-of-thought reasoning, they "think" before answering, using internal reasoning tokens that you pay for as output tokens. This means the visible cost per request is higher than the visible output would suggest. A short o3 answer might use 2,000 visible output tokens but 10,000 reasoning tokens behind the scenes.

o4-mini ($1.10/$4.40) is the best value reasoning model. It replaced o3-mini and outperforms it on most benchmarks while costing the same. Use it for math, science, coding problems, and any task that benefits from step-by-step reasoning. The cached input rate is $0.275 per million tokens.

o3 ($2.00/$8.00) is the flagship reasoning model. It costs roughly 2x o4-mini and offers stronger performance on the hardest problems. PhD-level math, complex logic, and multi-step scientific reasoning. For most applications, o4-mini is sufficient. Save o3 for tasks where o4-mini's accuracy drops below your threshold.

One important detail: reasoning models don't support all features. As of April 2026, o-series models support function calling and structured outputs, but some parameters like temperature and top_p behave differently. Check OpenAI's docs for the latest compatibility matrix before building reasoning model pipelines.

The cost trap with reasoning models is unpredictable token usage. A query that uses 500 reasoning tokens on one request might use 5,000 on a slightly different phrasing. Set max_completion_tokens to cap your worst-case costs, and monitor average reasoning token usage per query type.

Batch API: 50% Off Everything

OpenAI's Batch API processes requests asynchronously and returns results within 24 hours. The tradeoff for that latency is a flat 50% discount on all token costs, input and output, across every model.

At batch pricing, GPT-4.1 drops to $1/$4 per million tokens. GPT-4.1 Nano drops to $0.05/$0.20. o4-mini drops to $0.55/$2.20. These are some of the lowest prices available for capable language models.

Batch processing works by uploading a JSONL file of requests. Each line is a standard API request. You submit the batch, wait for completion (usually 1-12 hours in practice, 24-hour SLA), then download results. No code changes to your prompts, just a different submission endpoint.

Best use cases for batch: data extraction from documents, content generation, bulk classification, evaluation/grading, and dataset creation. Anything where you don't need real-time responses. If you're currently running these workloads through the standard API, switching to batch is the single biggest cost reduction available.

You can combine batch pricing with prompt caching for even deeper discounts. A cached batch request through GPT-4.1 pays $0.25 per million cached input tokens and $4 per million output tokens. That's 87.5% off standard input pricing.

Prompt Caching: How to Get 50% Off Every Request

OpenAI automatically caches repeated prompt prefixes and charges a reduced rate for cache reads on subsequent requests. The discount varies significantly by model family, making this a key factor in model selection for cost-sensitive applications.

GPT-5 family models get the deepest cache discount: 90% off input tokens for cache reads. This means GPT-5's effective cached input rate is just $0.125 per million tokens, cheaper than GPT-4.1 Nano's standard input rate. If your application sends the same long context repeatedly and you're using GPT-5, caching makes it remarkably affordable.

GPT-4.1 family models get 75% off cached reads. GPT-4.1's cached rate drops to $0.50 per million, GPT-4.1 Mini to $0.10, and GPT-4.1 Nano to $0.025. For applications with stable system prompts and shared context (RAG, chatbots, document processing), these cached rates make the GPT-4.1 family very competitive.

GPT-4o family and o-series models get 50% off cached reads. This is the weakest discount tier. o4-mini's cached rate is $0.275, o3's is $0.50, and GPT-4o's is $1.25. If you're choosing between GPT-4.1 ($0.50 cached) and GPT-4o ($1.25 cached), the caching difference alone is 2.5x.

Cache writes happen automatically when your prompt prefix exceeds a minimum length (typically 1,024 tokens). The cache has a TTL that varies by model, check OpenAI's docs for current values. Cache hits require an exact prefix match, so structure your prompts with stable content first (system prompt, few-shot examples, shared context) and variable content last (user query).

OpenAI vs Anthropic: Which API Costs Less for Your Workload?

The two leading API providers have different pricing philosophies. OpenAI offers more model tiers with a wider price range. Anthropic offers fewer models but competitive pricing on its core lineup. Here's how they compare head-to-head on comparable models.

Use CaseOpenAI ModelOpenAI PriceAnthropic ModelAnthropic Price
Budget tasksGPT-4.1 Nano ($0.10/$0.40)$0.10/$0.40Haiku 4.5 ($1/$5)$1/$5
Production workhorseGPT-4.1 ($2/$8)$2/$8Sonnet 4.6 ($3/$15)$3/$15
FlagshipGPT-5 ($1.25/$10)$1.25/$10Opus 4.6 ($5/$25)$5/$25
Budget reasoningo4-mini ($1.10/$4.40)$1.10/$4.40
Advanced reasoningo3 ($2/$8)$2/$8Opus Extended Thinking$5/$25

What This Actually Costs Per Month (Real Numbers)

Abstract per-token pricing is hard to reason about. Here's what real applications actually cost based on typical usage patterns.

Customer support chatbot (10K conversations/day): Average conversation uses 2,000 input tokens (system prompt + history + user message) and 500 output tokens. With GPT-4.1: (10,000 × 2,000 × $2/1M) + (10,000 × 500 × $8/1M) = $40 + $40 = $80/day ($2,400/month). With GPT-4.1 Nano: $2 + $2 = $4/day ($120/month). With prompt caching on GPT-4.1 (assuming 75% cache hits on the system prompt): roughly $1,800/month.

Document processing pipeline (1,000 docs/day, 10K tokens each): Input-heavy workload, 10M input tokens, 500K output tokens per day. With GPT-4.1: (10M × $2/1M) + (0.5M × $8/1M) = $20 + $4 = $24/day ($720/month). With batch processing (50% off): $12/day ($360/month). With GPT-4.1 Nano in batch mode: $0.50 + $0.10 = $0.60/day ($18/month).

AI coding assistant (500 requests/day, complex): Average request: 5,000 input tokens (code context + instructions), 2,000 output tokens. With GPT-4.1: (500 × 5,000 × $2/1M) + (500 × 2,000 × $8/1M) = $5 + $8 = $13/day ($390/month). With GPT-5 for hard problems: $3.13 + $10 = $13.13/day ($394/month). Surprisingly close. GPT-5's cheaper input but pricier output roughly balances out.

Reasoning-heavy analytics (200 queries/day): Average: 3,000 input tokens, 1,000 visible output tokens, 5,000 reasoning tokens. With o4-mini: (200 × 3,000 × $1.10/1M) + (200 × 6,000 × $4.40/1M) = $0.66 + $5.28 = $5.94/day ($178/month). With o3: $1.20 + $9.60 = $10.80/day ($324/month). The reasoning tokens are where the cost lives.

High-volume SaaS with model routing (100K requests/day): Route 70% to GPT-4.1 Nano (70K × 1,500 input × $0.10/1M + 70K × 400 output × $0.40/1M = $1.05 + $1.12 = $2.17/day), 25% to GPT-4.1 (25K × 3,000 × $2/1M + 25K × 1,000 × $8/1M = $150 + $200 = $350/day), and 5% to o4-mini (5K × 3,000 × $1.10/1M + 5K × 6,000 × $4.40/1M = $16.50 + $132 = $148.50/day). Total: ~$500/day ($15,000/month). Without routing (everything on GPT-4.1): ~$900/day ($27,000/month). Routing saves $12,000/month.

These examples assume standard API pricing. Apply batch discounts (50% off) to any non-real-time portion and caching discounts to repeated context for further savings. A well-optimized app typically spends 30-50% less than these baseline estimates.

Rate Limits: How Much You Need to Spend to Unlock Them

OpenAI uses a tiered rate limit system based on your cumulative API spend. Higher tiers unlock more requests per minute (RPM) and tokens per minute (TPM). This matters for production apps that need consistent throughput.

TierQualificationGPT-4.1 RPMGPT-4.1 TPMo4-mini RPM
Free$0 spend3 RPM40,0003 RPM
Tier 1$5 cumulative500 RPM200,000500 RPM
Tier 2$50 cumulative5,000 RPM2,000,0005,000 RPM
Tier 3$100 cumulative5,000 RPM4,000,0005,000 RPM
Tier 4$250 cumulative10,000 RPM10,000,00010,000 RPM
Tier 5$1,000 cumulative10,000 RPM30,000,00010,000 RPM

How to Cut Your OpenAI Bill by 60%

The biggest cost lever is model routing. Most production queries don't need GPT-4.1, they can be handled by GPT-4.1 Nano at 20x lower cost. Build a simple classifier (which itself runs on Nano) that routes requests to the cheapest model that can handle each query type. This alone can cut costs by 60-80%.

Prompt caching is the second lever. If you send the same system prompt, few-shot examples, or document context on every request, you're overpaying. Structure prompts so stable content comes first (cached) and variable content comes last. GPT-4.1's 75% cache discount means your 2,000-token system prompt costs $0.001 per request instead of $0.004.

Batch processing is the third lever. Any workload that doesn't need real-time responses, data processing, content generation, evaluations, nightly reports, should use the Batch API for a flat 50% discount. Combined with caching, batch GPT-4.1 processes cached input at $0.25 per million tokens.

Output token optimization matters disproportionately because output costs 4-8x more than input. Tell the model to be concise. Set max_tokens to prevent runaway responses. Use structured output mode to get clean JSON instead of prose. For classification tasks, request a single word or short label instead of an explanation.

Monitor and iterate. OpenAI's API returns token usage in every response. Build a dashboard that tracks cost per request by model, endpoint, and use case. Review weekly. Common findings: a debug system prompt that inflates every request by 500 tokens, a retry loop that doubles costs on timeouts, or a model assignment that hasn't been updated since you launched.

One underused optimization: response format constraints. For classification tasks, use response_format with a JSON schema that only allows your label values. This prevents the model from generating explanations you don't need, saving output tokens. For a 10-class classifier, structured output typically uses 5-20 output tokens instead of 50-200 with free-form responses. At scale, this 10x reduction in output tokens translates directly to a 10x reduction in output costs.

Will You Spend $50 or $5,000? How to Estimate Your Bill

Step 1: Count your requests per day by category. Separate simple tasks (classification, extraction, short answers) from complex tasks (reasoning, code generation, long-form content). This determines your model mix.

Step 2: Estimate average tokens per request. A typical chatbot message is 1,000-3,000 input tokens and 300-1,000 output tokens. A document processing request is 5,000-50,000 input tokens and 500-2,000 output tokens. A reasoning query is 2,000-5,000 input tokens, 500-2,000 visible output tokens, plus 3,000-10,000 reasoning tokens (billed as output).

Step 3: Calculate daily cost. For each category: (requests × avg_input_tokens × model_input_price/1M) + (requests × avg_output_tokens × model_output_price/1M). Add categories together. Multiply by 30 for monthly.

Step 4: Apply discounts. Subtract 50% if using batch processing. Subtract the cache discount percentage for the portion of input tokens that repeat across requests (system prompt, shared context). A typical chatbot with a 1,500-token system prompt gets 30-50% of its input tokens cached.

Step 5: Add a 20-30% buffer for token estimation variance, retry costs, and usage growth. Real production costs almost always exceed initial estimates because prompts grow, edge cases trigger longer responses, and usage increases after launch.

Quick reference: A startup running 5,000 requests/day through GPT-4.1 Nano for simple tasks and 1,000 requests/day through GPT-4.1 for complex tasks typically spends $200-500/month. A mid-size company running 50,000+ requests/day across multiple models typically spends $2,000-10,000/month depending on model mix and optimization.

Token Counting and How OpenAI Bills You

OpenAI uses byte-pair encoding (BPE) tokenization. One token is roughly 3-4 characters of English text, or about 0.75 words. A 1,000-word document typically tokenizes to 1,300-1,400 tokens. You can use OpenAI's tiktoken library to count tokens before sending requests.

System prompts count as input tokens on every request. A 500-token system prompt across 10,000 requests costs 5 million input tokens, $10 on GPT-4.1 before any user messages. This is why prompt caching matters: that same system prompt cached costs $2.50 instead of $10.

Function/tool definitions also count as input tokens. Defining 10 tools with detailed JSON schemas can add 2,000-5,000 tokens per request. Only include the tools each request actually needs, and keep descriptions concise.

The API response includes a usage object with exact prompt_tokens, completion_tokens, and (for reasoning models) reasoning_tokens counts. Log these for every request to build accurate cost tracking. Don't estimate, measure.

Image tokens depend on the image's dimensions and the detail parameter. A 1024x1024 image at high detail costs roughly 765 tokens. A low-detail image costs a fixed 85 tokens. If you're processing images that don't need pixel-level analysis, set detail: low to save 90% on image token costs.

Still on GPT-4? What Migration Saves You

If your application still uses GPT-4 ($30/$60), GPT-4 Turbo ($5/$15), or GPT-3.5 Turbo ($0.50/$1.50), you're overpaying significantly. Here's the migration path for each.

GPT-4 → GPT-4.1: Drop-in replacement for most use cases. GPT-4.1 is 15x cheaper ($2/$8 vs $30/$60) with better performance. The main difference is the API endpoint name, change gpt-4 to gpt-4.1 in your code and test. Context window jumps from 8K to 1M tokens.

GPT-4 Turbo → GPT-4.1: Also a near-drop-in replacement. GPT-4.1 is 2.5x cheaper ($2/$8 vs $5/$15) with a larger context window (1M vs 128K). Function calling and structured outputs work the same way. JSON mode is supported.

GPT-4o → GPT-4.1: Cheapest migration, saves 20% ($2/$8 vs $2.50/$10). GPT-4.1 adds the 1M context window and better instruction following. Vision capabilities are maintained. The only potential issue: GPT-4o has slightly different behavior on some creative writing tasks, so test your specific prompts.

GPT-3.5 Turbo → GPT-4.1 Nano: GPT-4.1 Nano is 5x cheaper ($0.10/$0.40 vs $0.50/$1.50) and significantly more capable. The quality jump is substantial. GPT-4.1 Nano handles tasks that GPT-3.5 Turbo struggled with, at a fraction of the cost.

o1 → o3 or o4-mini: o1 at $15/$60 is expensive. o3 ($2/$8) is 7.5x cheaper on input and o4-mini ($1.10/$4.40) is 13.6x cheaper. Both outperform o1 on most benchmarks. The migration requires testing because reasoning behavior differs between models, but the cost savings are dramatic.

Before migrating any model, run your evaluation suite against the new model to catch behavior differences. Pricing savings mean nothing if output quality drops. Most migrations are smooth, but edge cases in creative writing, specific formatting requirements, or fine-tuned model behavior can surface during testing.

5 Ways OpenAI Charges More Than You Expect

  • Output tokens cost 4-8x more than input tokens depending on the model. GPT-5 has the steepest ratio at 8x. Verbose responses can blow up costs fast.
  • Reasoning models (o3, o4-mini) use internal reasoning tokens billed as output tokens. A simple-looking o3 response might consume 5-10x more tokens than the visible output. You pay for the thinking, not just the answer.
  • GPT-4.1 replaced GPT-4o as the recommended model, but GPT-4o is still available. If your codebase still references GPT-4o, you're paying $2.50/$10 instead of $2/$8 for similar quality. Update your model strings.
  • Cache read pricing varies wildly by model family. GPT-5 family gets 90% off cached reads. GPT-4.1 family gets 75% off. GPT-4o family only gets 50% off. Pick models with better cache discounts if you send repeated context.
  • Rate limits on the free tier are tight: 3 RPM for reasoning models, 500 RPM for GPT-4o-mini. You need $5+ cumulative spend to unlock Tier 1 limits.
  • The Batch API gives 50% off but results arrive within 24 hours, not seconds. Fine for offline processing, useless for real-time apps.
  • Fine-tuned model pricing is separate and higher. Training costs $25/1M tokens for GPT-4o, and inference on fine-tuned models costs more than base models.
  • Image inputs (vision) are tokenized based on dimensions. A 1024x1024 image costs roughly 765 tokens. Sending high-res images without downscaling wastes tokens.

Which Model Should You Use? Quick Decision Guide

Simple chatbot or classification pipeline

GPT-4.1 Nano at $0.10/$0.40 per 1M tokens. It's 20x cheaper than GPT-4.1 and handles extraction, routing, and simple conversations. At this price point, even high-volume apps cost under $50/month.

Production AI application

GPT-4.1 at $2/$8 per 1M tokens. This is OpenAI's recommended production model, better instruction following and coding than GPT-4o at a lower price. Route simple tasks to GPT-4.1 Nano or GPT-4.1 Mini to save more.

Complex reasoning or research

o4-mini at $1.10/$4.40 for most reasoning tasks. Only upgrade to o3 ($2/$8) when o4-mini's accuracy isn't sufficient. Reserve GPT-5 ($1.25/$10) for your hardest agentic workflows.

Budget-conscious startup or hobby project

GPT-4.1 Nano ($0.10/$0.40) for everything, upgraded to GPT-4.1 Mini ($0.40/$1.60) when Nano's quality falls short. Combined with batch processing (50% off), you can run a serious app for under $20/month.

The Bottom Line

GPT-4.1 Nano at $0.10 per million input tokens is the cheapest capable model in OpenAI's lineup and handles most simple tasks. GPT-4.1 is the new production workhorse, better and cheaper than GPT-4o. For reasoning, o4-mini at $1.10/$4.40 undercuts o3 while matching it on most tasks. GPT-5 is the ceiling for hard problems. The biggest cost-saving move: route requests to the cheapest model that can handle each task, and use batch processing for anything that doesn't need real-time responses.

Disclosure: Pricing information is sourced from official websites and may change. We update this page regularly but always verify current pricing on the vendor's site before purchasing.

Related Resources

OpenAI API Review → Anthropic API Pricing → OpenAI vs Anthropic API → AWS Bedrock Pricing → Cohere API Pricing → Compare All LLM Token Prices → AI API Free Tiers Compared →

Frequently Asked Questions

How much does the OpenAI API cost?

It depends on the model. GPT-4.1 Nano is the cheapest at $0.10 per 1M input tokens. GPT-4.1 (the recommended production model) costs $2 per 1M input tokens. GPT-5 costs $1.25/$10. o4-mini costs $1.10/$4.40. Output tokens cost 4-8x more than input across all models.

What's the cheapest OpenAI model?

GPT-4.1 Nano at $0.10 per million input tokens and $0.40 per million output tokens. It's designed for classification, extraction, and simple tasks. With batch processing (50% off), it drops to $0.05/$0.20, effectively free for most use cases.

Should I use GPT-4o or GPT-4.1?

GPT-4.1. It replaced GPT-4o as OpenAI's recommended production model. It's cheaper ($2/$8 vs $2.50/$10), has a larger context window (1M vs 128K tokens), and performs better on instruction following and coding tasks. GPT-4o is still available but considered legacy.

What are reasoning tokens and why do they increase costs?

The o-series models (o3, o4-mini) use internal chain-of-thought reasoning before producing a visible answer. These reasoning tokens are billed as output tokens but aren't shown in the response. A simple o3 answer might use 500 visible output tokens but 5,000 reasoning tokens, meaning you pay for 5,500 output tokens total.

How does OpenAI pricing compare to Anthropic's Claude?

OpenAI is generally cheaper at the low end. GPT-4.1 Nano ($0.10/$0.40) is 10x cheaper than Claude Haiku 4.5 ($1/$5). At the production tier, GPT-4.1 ($2/$8) is cheaper than Sonnet 4.6 ($3/$15). At the flagship tier, GPT-5 ($1.25/$10) is cheaper than Opus 4.6 ($5/$25) on input but comparable on output. Quality differences vary by task.

What's the Batch API and when should I use it?

The Batch API processes requests asynchronously (results within 24 hours) at a 50% discount on all token costs. Use it for any workload that doesn't need real-time responses: data extraction, content generation, evaluations, nightly processing. It's the single biggest cost reduction available.

How does prompt caching work?

OpenAI automatically caches repeated prompt prefixes and charges a reduced rate on subsequent requests. The discount varies: GPT-5 family gets 90% off cached reads, GPT-4.1 family gets 75% off, GPT-4o/o-series get 50% off. Structure prompts with stable content first and variable content last to maximize cache hits.

What happened to GPT-4 and GPT-4 Turbo?

They're legacy models. GPT-4 ($30/$60 per 1M tokens) and GPT-4 Turbo ($5/$15) are still available but deprecated. GPT-4.1 ($2/$8) and GPT-4o ($2.50/$10) replaced them at dramatically lower prices with better performance. There's no reason to use GPT-4 or GPT-4 Turbo in new projects.

Is the OpenAI API free?

OpenAI offers a free tier with limited rate limits (3 RPM for reasoning models, 500 RPM for GPT-4o-mini). It's enough for testing and development but not production use. Paid tiers start at $5 cumulative spend and unlock significantly higher rate limits.

How do I reduce my OpenAI API costs?

Three high-impact moves: (1) Route simple requests to GPT-4.1 Nano instead of GPT-4.1, 20x cheaper. (2) Use the Batch API for non-real-time workloads, 50% off everything. (3) Enable prompt caching by structuring prompts with stable content first, up to 90% off repeated input tokens. Together these can cut costs by 70-85%.

See what AI skills pay in your role

Weekly data from 22,000+ job postings. Free.