LLM Primer (1 page)

📖 3 min read referencecheatsheetprimer

One-page LLM primer covering what LLMs are, how they work, and when to use each major model - Claude, GPT, Gemini, and DeepSeek.

What is an LLM? (one sentence)

A Large Language Model is software that predicts the next word by learning patterns from billions of examples.

Term	Definition
Token	A piece of text (word, punctuation, subword). Models process tokens, not letters.
Parameter	A “dial” the model uses to predict. More parameters = more nuance.
Context	The conversation history the model can see (usually 4K–400K tokens).
Hallucination	When a model generates plausible-sounding but false information.
Temperature	Controls randomness (0 = deterministic, 1.0 = creative).
Top-p sampling	Limits predictions to the most likely options for quality control.

Model	Best For	Context	Cost	Training Data
Claude Opus 4.8	Writing, reasoning, analysis	1M tokens	$5/$ 25 per 1M input/output	Jan 2026
GPT-5.5	Speed, all-purpose	1M tokens	$5/$ 30 per 1M	Dec 2025
Gemini 3.1 Pro	Long documents, research	1M tokens	$2/$ 12 per 1M (free tier available)	Jan 2025
Claude Sonnet 4.6	Balanced, coding	1M tokens	$3/$ 15 per 1M	Jan 2026
DeepSeek V4	Cost-conscious teams	256K tokens	10–50x cheaper	Late 2024

Give better instructions. Works for most tasks.

Feed the model your own data so it can answer questions about your documents.

Train the model on your examples so it learns your style/domain.

“LLMs understand language” → They pattern-match. Understanding is human projection.
“LLMs are general intelligences” → They’re narrow: good at text, bad at reasoning under uncertainty.
“ChatGPT is always right” → No. Verify important facts. They hallucinate.
“LLMs will replace humans” → No. They’re tools. Humans + LLMs > either alone.
“Training new models is cheap” → No. Billions in compute. Fine-tuning is accessible.