LLM Model Comparison
| Model | Context | Strengths | My notes |
|---|---|---|---|
| Claude Sonnet 4.6 | 200k | Long-context reasoning, coding | Default for agentic tasks. |
| Claude Opus 4.6 | 200k | Deep reasoning, nuance | Use when Sonnet gets it wrong. |
| Claude Haiku 4.5 | 200k | Fast, cheap | Great for classification + routing. |
| GPT-4-class | 128k+ | General, strong tools | — |
| Open-source (Llama/Mistral/Qwen) | varies | On-prem, privacy | Start here when data can’t leave. |
How I pick
1. Does this need to run on-prem or handle sensitive data? └─ yes → open-source + local inference └─ no → continue
2. Is latency the bottleneck (chatbot UX)? └─ yes → small/fast model + RAG └─ no → continue
3. Does the task need deep multi-step reasoning? └─ yes → frontier model └─ no → mid-tier frontier modelCost sanity checks
Before rolling anything to prod, multiply:
daily requests × avg input tokens × input $/Mtok+ daily requests × avg output tokens × output $/MtokIf the number scares you, add caching, a router, or a smaller model for the cheap path.