Skip to content

Claude Models

📖 5 min read claudeanthropicmodelsreferencevendor-comparison
Deep comparison of Claude Opus 4.8 vs Claude Sonnet 4.6 vs Claude Haiku 4.5 — capabilities, benchmarks, pricing, context windows, extended thinking, adaptive thinking, and model selection guide.
Key Takeaways
  • Three current model tiers: Opus 4.8 ($5/$25, 1M ctx), Sonnet 4.6 ($3/$15, 1M ctx), Haiku 4.5 ($1/$5, 200K ctx)
  • Opus 4.8 (May 2026) is the current flagship at $5/$25, 1M ctx — the 67% cut from Opus 4.1 ($15/$75) landed with Opus 4.7
  • Extended thinking on Sonnet + Haiku; Adaptive thinking on Opus + Sonnet (model decides when to think deeply)
  • All models support text + image input, tool use, streaming, structured outputs

Current Models — June 2026

FeatureClaude Opus 4.8Claude Sonnet 4.6Claude Haiku 4.5
DescriptionMost capable for complex reasoning and agentic codingBest speed + intelligence balanceFastest with near-frontier quality
API IDclaude-opus-4-8claude-sonnet-4-6claude-haiku-4-5-20251001
Input Pricing$5 / 1M tokens$3 / 1M tokens$1 / 1M tokens
Output Pricing$25 / 1M tokens$15 / 1M tokens$5 / 1M tokens
Context Window1M tokens (~555K words)1M tokens (~750K words)200K tokens (~150K words)
Max Output128K tokens64K tokens64K tokens
Extended ThinkingNo (uses Adaptive)YesYes
Adaptive ThinkingYesYesNo
LatencyModerateFastFastest
Knowledge CutoffJan 2026Jan 2026 (training) / Aug 2025 (reliable)Jul 2025 (training) / Feb 2025 (reliable)
Vision (Image Input)YesYesYes
Tool UseYesYesYes
StreamingYesYesYes
Prompt CachingYesYesYes
Batch API (50% off)2.50/2.50 / 12.501.50/1.50 / 7.500.50/0.50 / 2.50

Pricing History: Claude Opus 4.1 was 15/15/75 per 1M. Opus 4.7 cut that 67% to 5/5/25 (with 1M context and step-change agentic coding); Opus 4.8 (May 2026) keeps the same pricing. Sonnet 4 dropped from current to deprecated. See model deprecations.

Legacy & Deprecated Models

ModelStatusAPI PriceContextMigration Target
Claude Opus 4.7Legacy5/5/251MOpus 4.8
Claude Opus 4.6Legacy5/5/251MOpus 4.8
Claude Sonnet 4.5Legacy3/3/15200KSonnet 4.6
Claude Opus 4.1Legacy15/15/75200KOpus 4.8
Claude Opus 4Deprecated (June 15, 2026)15/15/75200KMigrate to Opus 4.8
Claude Sonnet 4Deprecated (June 15, 2026)3/3/15200KMigrate to Sonnet 4.6
Claude Haiku 3.5Retired (except Bedrock/Vertex)0.80/0.80/4200KHaiku 4.5

Extended vs Adaptive Thinking

Both features let models “think” before responding — spending additional tokens on internal reasoning.

FeatureHow It WorksAvailable OnBest For
Extended ThinkingYou set a budget (N tokens) for thinking. Model uses up to N tokens on internal reasoning before answering. Deterministic: you control the spend.Opus 4.6, Sonnet 4.5+, Haiku 4.5When you want predictable thinking budget
Adaptive ThinkingModel autonomously decides when and how much to think. No budget to set — the model judges complexity. Less predictable cost.Opus 4.8 (only)Complex multi-step reasoning without manual tuning

Practical guide:

  • Use Extended Thinking for coding, math, or when you need consistent latency/cost
  • Use Adaptive Thinking for open-ended analysis where the task complexity varies
  • Opus 4.8’s Adaptive Thinking is the default — the model handles the judgment

Model Selection Guide

What matters most?
├─ Maximum quality, complex reasoning → Claude Opus 4.8
│ Use when: agentic coding, deep analysis, long-form writing, R&D
│ Cost: $5/$25 per 1M. Batch: $2.50/$12.50
├─ Balanced speed + intelligence → Claude Sonnet 4.6
│ Use when: production workloads, chatbots, content generation, moderate coding
│ Cost: $3/$15 per 1M. Batch: $1.50/$7.50
├─ Speed + cost efficiency → Claude Haiku 4.5
│ Use when: high-throughput classification, simple Q&A, routing/ triage
│ Cost: $1/$5 per 1M. Batch: $0.50/$2.50
└─ Maximum speed (premium) → Fast Mode (Opus 4.8)
Use when: latency-critical, interactive apps, real-time demos
Cost: $30/$150 per 1M (6x standard)

Pricing at Scale — Cost Comparison

ScenarioOpus 4.8Sonnet 4.6Haiku 4.5
1K conversations/day, avg 5K tokens in/2K out25+25 + 25 = $50/day15+15 + 30 = $45/day5+5 + 10 = $15/day
100K conversations/day with Batch API1,250/day(1,250/day (37.5K/mo)750/day(750/day (22.5K/mo)250/day(250/day (7.5K/mo)
Single complex analysis (50K in, 10K out)$0.50$0.30$0.10
With prompt caching (10% input reads)~70% token cost savings~70% token cost savings~70% token cost savings

For all models: prompt caching cuts input costs to 10% for repeated content (system prompts, long docs). Batch API gives 50% off for async processing. Combine both for up to 95% savings on high-volume workloads.

Context Window Deep-Dive

All current Claude models have 1M token context (Opus/Sonnet) or 200K token context (Haiku).

What 1M tokens enables:
├─ Read War & Peace (~580K words) × 2
├─ Process entire codebases (200K+ lines of code)
├─ Load 10+ research papers simultaneously
├─ Full conversation history across days of chat
└─ Skip RAG entirely for <500K-token datasets
What 200K tokens enables:
├─ Read entire books (average novel ~80K words)
├─ Full API documentation
├─ Large transcripts and meeting notes
└─ Comprehensive RAG context

Tokenizer Note: Claude Opus 4.8 uses a new tokenizer vs previous models. For identical text, Opus 4.8 may use up to 35% more tokens. Factor this into cost comparisons — the lower per-token price partially offsets increased tokenization.

Comparing Across Models

For a broader comparison across Claude, GPT, Gemini, DeepSeek, and other providers, see the Models Decision Guide.