LLM Primer

What Large Language Models Are & How They Work

What is an LLM?

A neural network that predicts the next word in a sentence.

Input: Text

Process: Predict next token (word/subword)

Output: Text

It's not magic—it's pattern matching on billions of examples.

Token: ~4 characters

Parameter: A "weight" in the network (billions of them)

Context: Max tokens readable at once

Temperature: Randomness control (0=deterministic, 2=creative)

Hallucination: Confident false output

Step 1: Read trillions of text tokens

Step 2: Predict next token billions of times

Step 3: Adjust weights when predictions are wrong

Step 4: Repeat until accurate

Cost: $50M+ | Time: 6+ months

Prompting: Craft input (free, instant)

RAG: Add external docs ($, instant)

Fine-tune: Retrain on data ($$$, days)

Start with prompting. Use RAG for facts. Fine-tune for behavior.

Need facts? → Use RAG

Want different tone? → Fine-tune

Trying a new task? → Prompt engineering

Stuck? → Add examples (few-shot learning)