Skip to content

Models Decision Guide

How to choose the right model for your specific task. Reasoning, speed, cost, and capabilities compared.


Find the Right Model

Showing 12 of 12 models

Claude Opus 4.8

Anthropic
Context: 1M
Speed: slow
Price: $5/25 per 1M
reasoningwritinganalysiscoding

Complex reasoning, long documents

Claude Sonnet 4.6

Anthropic
Context: 1M
Speed: fast
Price: $3/15 per 1M
writingcodinganalysis

Default choice for most tasks

Claude Haiku 4.5

Anthropic
Context: 200K
Speed: ultra-fast
Price: $1/5 per 1M
routingclassificationspeed

Fast & cheap for simple tasks

GPT-5.5

OpenAI
Context: 1M
Speed: fast
Price: $5/30 per 1M
writingcodingvisionreasoning

Flagship reasoning & coding

GPT-5.4

OpenAI
Context: 1M
Speed: fast
Price: $2.5/15 per 1M
writingcodinganalysis

Best value for most production workloads

GPT-5.4 mini

OpenAI
Context: 400K
Speed: ultra-fast
Price: $0.75/4.5 per 1M
speedbudgetroutingcoding

Cost-efficient coding & agents

o3

OpenAI
Context: 128K
Speed: very-slow
reasoning

Hardest problems (math, logic)

o1

OpenAI
Context: 128K
Speed: slow
Price: $15/60 per 1M
reasoning

Hard reasoning at lower cost than o3

Gemini 3.1 Pro

Google
Context: 1M
Speed: medium
Price: $2/12 per 1M
long-contextvisionresearch

Massive documents & vision

DeepSeek V4 Flash

DeepSeek
Context: 1M
Speed: fast
Price: $0.14/0.28 per 1M
speedbudgetrouting

Extreme budget, minimal quality loss

DeepSeek V4 Pro

DeepSeek
Context: 1M
Speed: medium
Price: $0.435/0.87 per 1M
reasoningbudgetcoding

Frontier quality at fraction of cost

Llama 4

Meta
Context: varies
Speed: varies
Price: $0/0 per 1M
open-sourceprivacy

Private, self-hosted


Quick Decision Tree

1. What's your primary constraint?
├─ Cost → DeepSeek V4 Flash or GPT-5.5 Instant
├─ Speed → Claude Haiku or GPT-5.5 Instant
├─ Reasoning/Quality → Claude Opus or o3
├─ Long context → Gemini 3.1 Pro (1M tokens)
└─ Privacy/On-prem → Llama 4 self-hosted
2. What's your use case?
├─ Writing/Analysis → Claude Sonnet (default)
├─ Code generation → Claude Sonnet or GPT-4o
├─ Reasoning/Math → o3 or Claude Opus
├─ Vision/Images → GPT-4o or Gemini 3.1
├─ Document processing → Gemini 3.1 Pro (1M context)
└─ Real-time → Perplexity or Claude with web search

Model Categories

Tier 1: Reasoning Models (Slow but Brilliant)

Claude Opus 4.8 (Anthropic)

  • Context: 1M tokens (read a whole book)
  • Speed: Slow (think for 30+ seconds)
  • Cost: $15-75 per 1M tokens
  • Best for: Complex reasoning, multi-step logic, deep analysis
  • Why: Most capable model available. Use when Sonnet struggles.
  • When NOT to use: Simple tasks, real-time applications, cost-sensitive work

o3 (OpenAI)

  • Context: 128K tokens
  • Speed: Very slow (extended thinking)
  • Cost: Premium pricing
  • Best for: Extremely hard problems (math, coding competitions, logic puzzles)
  • Why: Breakthrough reasoning capability
  • When NOT to use: General tasks (overkill), any time-sensitive work

o1 (OpenAI)

  • Context: 128K tokens
  • Speed: Slow but faster than o3
  • Cost: 15per1Minput,15 per 1M input, 60 per 1M output
  • Best for: Difficult reasoning without needing extended thinking
  • Why: Good reasoning at reasonable speed
  • When NOT to use: Simple tasks, real-time

Tier 2: Default Models (Fast & Smart)

Claude 3.5 Sonnet 4.6 (Anthropic) ⭐ Recommended Default

  • Context: 200K tokens
  • Speed: Fast (2-5 seconds)
  • Cost: $3-15 per 1M tokens
  • Best for: Almost everything - writing, code, analysis
  • Why: Best balance of speed, quality, cost
  • When to use: Your first choice for any task

GPT-4o (OpenAI)

  • Context: 128K tokens
  • Speed: Fast (2-5 seconds)
  • Cost: $2-8 per 1M tokens
  • Best for: All-around work, especially vision/images
  • Why: Extremely reliable, good at everything
  • When to use: When you need vision, or want OpenAI’s reliability

Gemini 3.1 Pro (Google)

  • Context: 1M tokens (!!)
  • Speed: Medium (5-10 seconds)
  • Cost: $2-12 per 1M tokens
  • Best for: Document analysis, long-context research
  • Why: Only model that can read entire books
  • When to use: When context window matters more than speed

Tier 3: Speed-Focused (Fast & Cheap)

Claude 3.5 Haiku (Anthropic)

  • Context: 200K tokens
  • Speed: Ultra-fast (under 1 second)
  • Cost: $0.80-4 per 1M tokens
  • Best for: Classification, routing, summaries, high volume
  • Why: Surprisingly capable despite being the smallest
  • When to use: When speed is critical or volume is high

GPT-4 Turbo (OpenAI)

  • Context: 128K tokens
  • Speed: Fast
  • Cost: $0.01-0.03 per 1K tokens
  • Best for: Production systems, high volume
  • Why: Reliable, cheap
  • When to use: Cost-sensitive production

DeepSeek V4 Flash (China)

  • Context: 128K tokens
  • Speed: Fast
  • Cost: $0.14-0.28 per 1M tokens (!!)
  • Vision: ❌ Text only. Use DeepSeek VL for image tasks.
  • Best for: Budget-conscious work, routing, high volume
  • Why: Shockingly cheap and good quality

DeepSeek VL (China)

  • Context: 128K tokens
  • Speed: Medium
  • Cost: ~0.55/0.55/2.19 per 1M tokens
  • Vision: ✅ Images
  • Best for: Vision tasks within DeepSeek ecosystem
  • Why: DeepSeek’s dedicated vision model. Use when you need image understanding at DeepSeek pricing.
  • When to use: Cost is the #1 constraint

Tier 4: Specialized

GPT-4 Vision (OpenAI)

  • Best for: Image analysis, OCR, visual understanding
  • Why: GPT-4o is better and cheaper now
  • When to use: Legacy systems

Claude 3 Opus (Anthropic, previous version)

  • Replaced by Claude 4.7
  • When to use: Nowhere; use Opus 4.8 instead

Open-Source Models (Llama, Mistral, DeepSeek)

  • Best for: Privacy, on-premise, fine-tuning
  • Why: Full control, no API costs
  • When to use: When data can’t leave your infrastructure
  • How: Run locally with Ollama or LM Studio

Decision Matrix By Use Case

TaskModelWhyCost
Customer support chatbotHaikuFast, cheap$2-5/month
Blog post writingSonnetQuality + speed balance$1-3/month
Code generationSonnet or GPT-4oBoth excellent$2-5/month
Complex reasoningOpus or o3Need the power$50-200/month
Document analysis (100 pages)Gemini 3.1 ProOnly fits 1M context$2-10/month
Real-time Q&APerplexityWeb search built-inFree-20/month
Vision/image tasksGPT-4oBest at images$2-5/month
Routing/classificationHaikuSpeed + cheap$1-2/month
Data extractionSonnet + structured outputReliable parsing$2-5/month
High volume (1000+ requests/day)Haiku or V4 FlashNeed cheap inference$10-50/month

Cost Comparison for Common Scenarios

Scenario 1: Personal Research Assistant

Use case: 10 questions/day, 2000 input tokens avg, 500 output tokens

ModelMonthly CostSpeedQuality
Claude Sonnet$0.90FastExcellent
GPT-4o$0.60FastExcellent
Gemini 3.1 Pro$0.60MediumExcellent
DeepSeek V4$0.33FastGood

Recommendation: Sonnet or GPT-4o (negligible difference)

Scenario 2: High-Volume Classification (10,000 req/day)

Use case: 500 input tokens, 50 output tokens per request

ModelMonthly CostSpeedQuality
Claude Haiku$45Ultra-fastGood
GPT-4 Turbo$15FastGood
DeepSeek Flash$7FastGood

Recommendation: DeepSeek Flash (10x cheaper than Haiku)

Scenario 3: Complex Reasoning (50 req/day)

Use case: 3000 input tokens, 2000 output tokens per request

ModelMonthly CostSpeedQuality
Claude Opus$675SlowExcellent
Claude Sonnet$225FastExcellent
o3$2000Very slowBest-in-class
o1$450SlowExcellent

Recommendation: Sonnet (best balance), o3 (if you need the best and can wait)


Which Model for Your Project?

If You’re Building a Startup/Product

Start with: Claude Sonnet + Haiku combo

  • Sonnet for complex tasks
  • Haiku for high-volume/cheap tasks
  • Reason: Cost-effective, reliable, good quality

Scale with: Opus if you hit reasoning limits

If You’re Prototyping/Learning

Start with: GPT-4o or Claude Sonnet (free tier)

  • Both have good free credits
  • Reason: Simplest to get started

Explore: Try multiple models on the same task to see tradeoffs

If Cost Is Your #1 Constraint

Use: DeepSeek V4 Flash + Sonnet combo

  • Flash for everything possible
  • Sonnet when Flash isn’t good enough
  • Reason: 10x cheaper overall

If You Need Long Context (1000+ page documents)

Use: Gemini 3.1 Pro (only option with 1M context)

  • Reason: Nothing else can handle that much text

If You Need On-Premise/Privacy

Use: Llama 4 (run locally with Ollama)

  • Cost: Free (just electricity)
  • Tradeoff: Slower, less capable
  • Reason: Data never leaves your machine

If Speed Matters Most

Use: Haiku + use caching

  • Response time: under 1 second
  • Or: o1 if you need reasoning (slower but better)
  • Reason: Trade quality for speed when needed

Optimization Strategies

Strategy 1: Routing (Mixture of Models)

Use a cheap model to decide which model to use:

Input: User question
Haiku: "Is this question simple? (yes/no)"
├─ yes → Use Haiku for answer (cheap)
└─ no → Use Sonnet for answer (better)

Savings: 80% of questions use Haiku, 20% use Sonnet = 30% cost reduction

Strategy 2: Caching

If you analyze the same document repeatedly:

First request: Analyze document X (full cost)
Second request: Analyze document X (90% cheaper - cached)

Savings: With caching, 2nd-10th requests are 90% cheaper

Strategy 3: Batch Processing

Don’t ask questions one-at-a-time:

❌ Bad: 1000 questions, each call costs $0.01 = $10
✅ Good: Batch 100 questions per call, 10 calls = $0.10

Savings: 100x for certain APIs


Vision / Image Input Support

Not all models can process images. Here’s which ones can and what they support:

ModelVisionTypeNotes
Claude Sonnet 4.6ImagesStrong image analysis, charts, documents
Claude Opus 4.8ImagesBest for detailed visual reasoning
GPT-5.5ImagesSolid multimodal
GPT-5.5 InstantText onlyFastest, no vision
o3Text onlyReasoning-only model
Gemini 3.1 ProImages + VideoBest multimodal support
DeepSeek V4Text onlyUse DeepSeek VL for vision
DeepSeek V4 FlashText onlyUse DeepSeek VL for vision
DeepSeek VLImagesDeepSeek’s dedicated vision model
Llama 4ImagesOpen-source multimodal

Key takeaway: If your task involves analyzing images, charts, or screenshots, pick a model with ✅. DeepSeek V4 and V4 Flash are excellent for text-only tasks but can’t process images at all - use DeepSeek VL if you need vision in the DeepSeek ecosystem.


What Changed Recently (May 2026)

  • o3 released (best reasoning ever, very expensive)
  • Claude 4.7 released (Opus version, improved reasoning)
  • DeepSeek V4 released (open-source reasoning, cheaper than ever)
  • Gemini 3.1 released (1M context window, multimodal)
  • GPT-5.5 released (minor improvements over 4o)

Implication: 2026 is dominated by reasoning models and long-context models.


Common Mistakes

Using Opus for everything - Overkill 95% of the time
Use Sonnet by default, Opus when needed

Ignoring cost - Can add up fast with high volume
Calculate your actual usage, optimize routing

Assuming newer = better - Sometimes not true
Test models on your actual task

Using same model for everything - Suboptimal
Use a mix (Haiku for cheap, Sonnet for quality)


Model Specifications & Capabilities

All current models with pricing, context windows, and vision support. Source of truth: src/data/models.ts.

Model Company Context Input/Output Vision Notes
Claude Opus 4.8 Anthropic 1M $5/$25 per 1M Most capable Claude (May 2026). Best for complex reasoning and agentic coding. Adaptive thinking. Fast Mode $10/$50.
Claude Opus 4.8 (Thinking) Anthropic 1M $5/$25 per 1M Top-ranked on Design Arena. Thinking mode enabled.
Claude Opus 4.7 Anthropic 1M $5/$25 per 1M Previous flagship (superseded by Opus 4.8, May 2026).
Claude Opus 4.6 Anthropic 1M $5/$25 per 1M Previous gen flagship. Still highly capable.
Claude Opus 4.6 (Thinking) Anthropic 1M $5/$25 per 1M Previous gen with thinking mode. Strong on design benchmarks.
Claude Opus 4.5 Anthropic 200K $5/$25 per 1M Earlier generation. Still available for certain use cases.
Claude Sonnet 4.6 Anthropic 1M $3/$15 per 1M Best balance of speed & quality. Default pick.
Claude Haiku 4.5 Anthropic 200K $1/$5 per 1M Ultra-fast, cheapest Claude.
GPT-5.5 OpenAI 1M $5/$30 per 1M Flagship. Reasoning levels none→xhigh. Strong all-around.
GPT-5.4 OpenAI 1M $2.50/$15 per 1M Affordable professional tier. Near-flagship capability.
GPT-5.4 mini OpenAI 400K $0.75/$4.50 per 1M Strong mini for coding & agents. Fast.
GPT-5.4 nano OpenAI 400K $0.20/$1.25 per 1M Fastest, cheapest. Ideal for high-throughput.
GPT-4.1 OpenAI 128K $2/$8 per 1M Previous gen. Superseded by GPT-5.4 mini.
o3 OpenAI 128K $2/$8 per 1M Dedicated reasoning model. Spends tokens on hidden thinking. 87% cheaper than o1.
o1 OpenAI 128K $15/$60 per 1M Earlier reasoning model. Superseded by o3.
Gemini 3.1 Pro Google 1M $2/$12 per 1M Flagship Gemini. Best context window, excellent multimodal. Prompts >200K billed $4/$18.
Gemini 3.5 Flash Google 1M $1.50/$9 per 1M Fast Gemini. $0.15/M cached input (90% off). Free tier on AI Studio.
DeepSeek V4 Flash DeepSeek 1M $0.14/$0.28 per 1M Cost leader. MIT license. FREE on OpenCode.
DeepSeek V4 Pro DeepSeek 1M $0.435/$0.87 per 1M Premium tier. Thinking mode default. 75% price cut now permanent (announced May 22, 2026).
DeepSeek R1 DeepSeek 1M $0.435/$0.87 per 1M Deprecated as standalone; folded into V4 Flash thinking mode (deepseek-reasoner). Open-weight.
DeepSeek V4 DeepSeek 128K $0.55/$2.19 per 1M Previous gen. Superseded by V4 Flash and Pro.
Llama 4 Meta varies Free (self-host) Open weights. MIT license. Run locally.
Llama 4 Scout Meta 10M Free (self-host) MoE variant. 10M context window, 109B total params.
Muse Spark Meta varies API-only (preview) Meta's first proprietary (closed-weight) frontier model, Apr 2026. Powers Meta AI; private-preview API. NOT open-weight.
Grok 4.3 xAI 1M $1.25/$2.50 per 1M xAI flagship (Apr 2026). Real-time X data. Legacy Grok 3/4 aliases route here.
Grok 3 Pro xAI 128K $3/$15 per 1M Previous gen. Routes to Grok 4.3.
Kimi K2.6 Moonshot AI 256K ~$0.60/$2.50 per 1M Latest Kimi. Top-5 on Design Arena. Agent swarm capabilities. ($0.16/M cached input.)
Kimi K2.5 (Thinking) Moonshot AI 256K ~$0.55/$2.19 per 1M Previous gen with thinking mode.
GLM 5.1 Zhipu AI 200K ~$0.98/$3.08 per 1M Zhipu's flagship. Top-5 on Design Arena. Open-weight.
GLM 5 Turbo Zhipu AI 128K ~$0.30/$1.00 per 1M Fast inference variant of GLM 5.
GLM 5 Zhipu AI 128K ~$0.60/$1.92 per 1M Base GLM 5 model. Strong multilingual performance.
GLM 4.7 Zhipu AI 128K ~$0.30/$1.00 per 1M Mid-cycle update between GLM 4 and GLM 5.
GLM 4 Zhipu AI 128K ~$0.20/$0.80 per 1M Previous gen. Still solid for Chinese-language tasks.
Qwen 3.6 Alibaba 128K ~$0.33/$1.95 per 1M Alibaba's flagship. Strong across all benchmarks. (DashScope direct pricing.)
MiniMax M2.7 MiniMax 128K ~$0.30/$1.20 per 1M Independent Chinese AI lab. Strong long-context performance.
MiMo V2.5 Xiaomi 128K ~$1/$3 per 1M Xiaomi's multimodal model.

Model Capability Matrix

How models perform across key tasks, rated on a 1-5 scale based on benchmark scores and real-world performance.

Strength: Best Strong Good Fair Limited
Filter:
Task \ Model Opus Claude Sonnet Claude GPT-5.5 GPT Instant GPT Gemini Gemini DS V4 DeepSeek DS VL DeepSeek o3 OpenAI Llama 4 Llama K2.6 Moonshot GLM 5.1 Zhipu Muse Meta DS Pro DeepSeek Opus 4.6 Claude Grok 3 xAI Qwen 3.6 Alibaba Scout Llama G 3 Mini Gemini M2.7 MiniMax
Coding Generate and refactor code 5
Claude 4 Opus Coding: 5/5 HumanEval 96.2% Best-in-class code generation
4
Claude Sonnet 4.6 Coding: 4/5 HumanEval 93.7% Strong daily driver
5
GPT-5.5 Coding: 5/5 HumanEval 95.1% Excellent for most tasks
4
GPT-5.5 Instant Coding: 4/5 HumanEval 92.8% Fast, good quality
4
Gemini 3.1 Pro Coding: 4/5 HumanEval 94.0% Strong, especially with long context
4
DeepSeek V4 Coding: 4/5 HumanEval 91.5% Surprisingly capable for price
4
DeepSeek VL Coding: 4/5 HumanEval 90%+ Strong coder with vision understanding
5
o3 Coding: 5/5 SWE-bench 71.7% Top-tier for complex coding
3
Llama 4 405B Coding: 3/5 HumanEval 90.2% Good open-source option
4
Kimi K2.6 Coding: 4/5 Strong coder, agentic capabilities
4
GLM 5.1 Coding: 4/5 Strong multilingual coder
4
Muse Spark Coding: 4/5 Strong coder, Llama lineage
4
DeepSeek V4 Pro Coding: 4/5 Strong coder, premium variant
5
Claude Opus 4.6 Coding: 5/5 HumanEval ~95% Excellent coder, slightly behind 4.7
4
Grok 3 Pro Coding: 4/5 Strong coder, real-time data access
4
Qwen 3.6 Coding: 4/5 Strong multilingual coder
3
Llama 4 Scout Coding: 3/5 Decent coder, MoE efficiency
4
Gemini 3 Mini Coding: 4/5 Good coder for its size
4
MiniMax M2.7 Coding: 4/5 Strong coder
Math Mathematical reasoning 5
Claude 4 Opus Math: 5/5 MATH 96.8% Excellent mathematical reasoning
4
Claude Sonnet 4.6 Math: 4/5 MATH 94.2% Strong, suitable for most needs
5
GPT-5.5 Math: 5/5 MATH 95.5% Very strong math capability
4
GPT-5.5 Instant Math: 4/5 MATH 92.1% Fast, good for basic math
5
Gemini 3.1 Pro Math: 5/5 MATH 96.0% Excellent math performance
4
DeepSeek V4 Math: 4/5 MATH 93.8% Strong for the price
4
DeepSeek VL Math: 4/5 Good math, similar to DeepSeek V4
5
o3 Math: 5/5 MATH 97.9% Best-in-class math
3
Llama 4 405B Math: 3/5 MATH 89.6% Decent open-source option
4
Kimi K2.6 Math: 4/5 Solid math reasoning
4
GLM 5.1 Math: 4/5 Good math reasoning
4
Muse Spark Math: 4/5 Good math reasoning
4
DeepSeek V4 Pro Math: 4/5 Good math reasoning
5
Claude Opus 4.6 Math: 5/5 MATH ~95% Strong math capabilities
4
Grok 3 Pro Math: 4/5 Good math reasoning
4
Qwen 3.6 Math: 4/5 Good math reasoning
3
Llama 4 Scout Math: 3/5 Adequate math reasoning
4
Gemini 3 Mini Math: 4/5 Solid math reasoning
4
MiniMax M2.7 Math: 4/5 Good math reasoning
Reasoning Complex multi-step reasoning 5
Claude 4 Opus Reasoning: 5/5 GPQA 84.6% Deep, nuanced reasoning
4
Claude Sonnet 4.6 Reasoning: 4/5 GPQA 79.8% Strong reasoning for most tasks
4
GPT-5.5 Reasoning: 4/5 GPQA 82.1% Capable multi-step reasoning
3
GPT-5.5 Instant Reasoning: 3/5 GPQA 78.0% Good, but trades depth for speed
4
Gemini 3.1 Pro Reasoning: 4/5 GPQA 81.5% Solid reasoning, improved with 3.1
4
DeepSeek V4 Reasoning: 4/5 GPQA 76.4% Remarkably capable for cost
4
DeepSeek VL Reasoning: 4/5 Solid reasoning with visual context
5
o3 Reasoning: 5/5 GPQA 87.3% State-of-the-art reasoning
3
Llama 4 405B Reasoning: 3/5 GPQA 73.1% Competitive open-source
4
Kimi K2.6 Reasoning: 4/5 Strong reasoning with thinking mode
4
GLM 5.1 Reasoning: 4/5 Solid reasoning capabilities
4
Muse Spark Reasoning: 4/5 Competitive reasoning
4
DeepSeek V4 Pro Reasoning: 4/5 Solid reasoning capabilities
5
Claude Opus 4.6 Reasoning: 5/5 Deep reasoning, thinking mode available
4
Grok 3 Pro Reasoning: 4/5 Solid multi-step reasoning
4
Qwen 3.6 Reasoning: 4/5 Solid reasoning
3
Llama 4 Scout Reasoning: 3/5 Competitive reasoning for size
3
Gemini 3 Mini Reasoning: 3/5 Adequate reasoning
4
MiniMax M2.7 Reasoning: 4/5 Solid reasoning
Writing Prose, analysis, long-form 5
Claude 4 Opus Writing: 5/5 Best prose, nuance, and voice
5
Claude Sonnet 4.6 Writing: 5/5 Excellent writing for daily use
4
GPT-5.5 Writing: 4/5 Very good, slightly less nuanced
3
GPT-5.5 Instant Writing: 3/5 Adequate, optimized for speed
4
Gemini 3.1 Pro Writing: 4/5 Strong, especially analytical writing
3
DeepSeek V4 Writing: 3/5 Decent, lags behind top models
3
DeepSeek VL Writing: 3/5 Decent, vision-enhanced writing
3
o3 Writing: 3/5 Reasoning-focused, not writing-optimized
3
Llama 4 405B Writing: 3/5 Solid for open-source
4
Kimi K2.6 Writing: 4/5 Good long-form writing
4
GLM 5.1 Writing: 4/5 Strong multilingual writing
3
Muse Spark Writing: 3/5 Adequate prose generation
3
DeepSeek V4 Pro Writing: 3/5 Decent writing quality
5
Claude Opus 4.6 Writing: 5/5 Excellent prose quality
3
Grok 3 Pro Writing: 3/5 Adequate, not writing-optimized
4
Qwen 3.6 Writing: 4/5 Strong multilingual writing
3
Llama 4 Scout Writing: 3/5 Solid for open-source
3
Gemini 3 Mini Writing: 3/5 Decent, speed-optimized
3
MiniMax M2.7 Writing: 3/5 Adequate writing quality
Vision Image understanding 4
Claude 4 Opus Vision: 4/5 Good image understanding
4
Claude Sonnet 4.6 Vision: 4/5 Strong vision capability
4
GPT-5.5 Vision: 4/5 Multimodal, strong image analysis
3
GPT-5.5 Instant Vision: 3/5 Basic vision support
5
Gemini 3.1 Pro Vision: 5/5 Best-in-class multimodal
0
DeepSeek V4 Vision: 0/5 Text-only model. Use DeepSeek VL for vision.
4
DeepSeek VL Vision: 4/5 DeepSeek's dedicated vision model. Strong image understanding.
3
o3 Vision: 3/5 Text-only reasoning model
3
Llama 4 405B Vision: 3/5 Basic multimodal support
3
Kimi K2.6 Vision: 3/5 Basic vision support
3
GLM 5.1 Vision: 3/5 Basic vision support
3
Muse Spark Vision: 3/5 Basic multimodal support
0
DeepSeek V4 Pro Vision: 0/5 Text-only. Use DeepSeek VL for vision.
4
Claude Opus 4.6 Vision: 4/5 Good image understanding
3
Grok 3 Pro Vision: 3/5 Basic vision support
3
Qwen 3.6 Vision: 3/5 Basic vision support
3
Llama 4 Scout Vision: 3/5 Basic multimodal support
4
Gemini 3 Mini Vision: 4/5 Good vision for speed-optimized
3
MiniMax M2.7 Vision: 3/5 Basic vision support
Long Context Processing large documents 5
Claude 4 Opus Long Context: 5/5 400K context Excellent long-doc processing
4
Claude Sonnet 4.6 Long Context: 4/5 200K context Very capable with long docs
4
GPT-5.5 Long Context: 4/5 128K context Solid long context
3
GPT-5.5 Instant Long Context: 3/5 128K context Same window, faster processing
5
Gemini 3.1 Pro Long Context: 5/5 1M context Industry-leading context window
3
DeepSeek V4 Long Context: 3/5 128K context Standard context window
3
DeepSeek VL Long Context: 3/5 128K context Same context window as V4
3
o3 Long Context: 3/5 128K context Focuses on depth, not span
3
Llama 4 405B Long Context: 3/5 128K context Standard for open-source
5
Kimi K2.6 Long Context: 5/5 256K context Excellent long-context, agent swarm
3
GLM 5.1 Long Context: 3/5 128K context Standard context window
3
Muse Spark Long Context: 3/5 128K context Standard for open-weight
3
DeepSeek V4 Pro Long Context: 3/5 128K context Standard context window
4
Claude Opus 4.6 Long Context: 4/5 200K context Solid long-doc processing
3
Grok 3 Pro Long Context: 3/5 128K context Standard context window
3
Qwen 3.6 Long Context: 3/5 128K context Standard context window
5
Llama 4 Scout Long Context: 5/5 10M context Massive context window, best in class
3
Gemini 3 Mini Long Context: 3/5 128K context Standard context window
4
MiniMax M2.7 Long Context: 4/5 128K context Strong long-context performance
Agentic Tool use, multi-step tasks 5
Claude 4 Opus Agentic: 5/5 Excellent tool use and reasoning
5
Claude Sonnet 4.6 Agentic: 5/5 SWE-bench 49% Best-in-class agentic coding
4
GPT-5.5 Agentic: 4/5 Strong function calling
3
GPT-5.5 Instant Agentic: 3/5 Fast but less reliable
4
Gemini 3.1 Pro Agentic: 4/5 Good tool use, improving
3
DeepSeek V4 Agentic: 3/5 Basic function calling
3
DeepSeek VL Agentic: 3/5 Basic function calling with vision
4
o3 Agentic: 4/5 Reasoning-first agentic
3
Llama 4 405B Agentic: 3/5 Improving with each release
5
Kimi K2.6 Agentic: 5/5 Up to 100 specialized agents in swarm
3
GLM 5.1 Agentic: 3/5 Basic agentic capabilities
4
Muse Spark Agentic: 4/5 Good tool use
3
DeepSeek V4 Pro Agentic: 3/5 Basic function calling
5
Claude Opus 4.6 Agentic: 5/5 Excellent tool use
3
Grok 3 Pro Agentic: 3/5 Basic function calling
3
Qwen 3.6 Agentic: 3/5 Basic agentic capabilities
3
Llama 4 Scout Agentic: 3/5 Basic tool use
3
Gemini 3 Mini Agentic: 3/5 Basic agentic capabilities
3
MiniMax M2.7 Agentic: 3/5 Basic agentic capabilities
Speed Response latency 2
Claude 4 Opus Speed: 2/5 Slowest, but most thoughtful
3
Claude Sonnet 4.6 Speed: 3/5 Moderate speed
4
GPT-5.5 Speed: 4/5 Fast for frontier quality
5
GPT-5.5 Instant Speed: 5/5 Fastest in class, <1s responses
4
Gemini 3.1 Pro Speed: 4/5 Consistently fast
4
DeepSeek V4 Speed: 4/5 Good speed for the price
3
DeepSeek VL Speed: 3/5 Slower than V4 due to vision processing
1
o3 Speed: 1/5 Slow deliberative reasoning
3
Llama 4 405B Speed: 3/5 Varies by deployment
3
Kimi K2.6 Speed: 3/5 Moderate speed
4
GLM 5.1 Speed: 4/5 Fast inference
4
Muse Spark Speed: 4/5 Fast inference
4
DeepSeek V4 Pro Speed: 4/5 Fast inference
2
Claude Opus 4.6 Speed: 2/5 Slower, thoughtful responses
4
Grok 3 Pro Speed: 4/5 Fast inference
4
Qwen 3.6 Speed: 4/5 Fast inference
3
Llama 4 Scout Speed: 3/5 MoE, moderate speed
5
Gemini 3 Mini Speed: 5/5 Fastest Gemini variant
4
MiniMax M2.7 Speed: 4/5 Fast inference
Cost Efficiency Value per dollar 2
Claude 4 Opus Cost Efficiency: 2/5 $15/$75 per 1M Most expensive per token
3
Claude Sonnet 4.6 Cost Efficiency: 3/5 $3/$15 per 1M Reasonable for quality
3
GPT-5.5 Cost Efficiency: 3/5 $2/$8 per 1M Competitive pricing
4
GPT-5.5 Instant Cost Efficiency: 4/5 $0.05/$0.20 per 1M Very cheap, fast
4
Gemini 3.1 Pro Cost Efficiency: 4/5 $2/$12 per 1M Good value for long context
5
DeepSeek V4 Cost Efficiency: 5/5 $0.55/$2.19 per 1M 10-50x cheaper than peers
4
DeepSeek VL Cost Efficiency: 4/5 Competitive pricing for vision tasks
1
o3 Cost Efficiency: 1/5 $10-60 per 1M output Most expensive reasoning
5
Llama 4 405B Cost Efficiency: 5/5 Free (self-host) Open-source, no API costs
4
Kimi K2.6 Cost Efficiency: 4/5 Competitive pricing
4
GLM 5.1 Cost Efficiency: 4/5 Competitive pricing
5
Muse Spark Cost Efficiency: 5/5 Free (self-host) Open-weight, no API costs
4
DeepSeek V4 Pro Cost Efficiency: 4/5 Good value for quality
2
Claude Opus 4.6 Cost Efficiency: 2/5 $15/$75 per 1M Expensive but capable
2
Grok 3 Pro Cost Efficiency: 2/5 $3/$15 per 1M Premium pricing
4
Qwen 3.6 Cost Efficiency: 4/5 Competitive pricing
5
Llama 4 Scout Cost Efficiency: 5/5 Free (self-host) Open-weight, no API costs
4
Gemini 3 Mini Cost Efficiency: 4/5 $1/$6 per 1M Affordable for quality
4
MiniMax M2.7 Cost Efficiency: 4/5 Competitive pricing
ScoreMeaning
5 (dark green)Best in class. Top performer for this task.
4 (light green)Strong. Excellent for most use cases.
3 (yellow)Good. Capable but not top-tier.
2 (orange)Fair. Works for simple cases.
1 (red)Limited. Not recommended for this task.

Scores combine public benchmarks and real-world usage as of May 2026. “Speed” measures output latency, not throughput. For detailed benchmark numbers, see the Benchmarks page.


Where to Start

  1. Pick a default: Claude Sonnet (recommended) or GPT-4o
  2. Use free credits to test on your actual task
  3. Measure cost: How many tokens? How many requests?
  4. Optimize: Add Haiku for cheap tasks, Opus only when Sonnet fails
  5. Monitor: Track costs monthly

See Also: