1. Clarity Over Power
A simple, understandable system that works predictably beats a powerful black box. When you understand what a system does, you can trust it, debug it, and improve it. In production, clarity wins.
Foundational principles for how to think about, build, and deploy AI systems. These aren’t rules - they’re guideposts for judgment calls.
1. Clarity Over Power
A simple, understandable system that works predictably beats a powerful black box. When you understand what a system does, you can trust it, debug it, and improve it. In production, clarity wins.
2. Measure What Matters
Metrics drive behavior. If you optimize for accuracy but care about speed, you’ll get accurate but slow. Define what “success” looks like before building. Measure it honestly. Wrong metrics → wrong outcomes.
3. Start Simple, Add Complexity Cautiously
Simple prompt > complex multi-step system. RAG > fine-tuning (usually). Single model > multi-agent crew. Each layer of complexity is a new place for bugs. Earn the right to complexity by proving simpler solutions fail first.
4. Data Quality Beats Data Quantity
1000 good examples > 1M bad examples. Garbage in, garbage out. Whether training, fine-tuning, or RAG, curate your data ruthlessly. Spending a day cleaning data saves a week debugging a broken system.
5. Test on Real Problems
Benchmark performance on academic tests, but validate on real user problems. Models that score well on benchmarks sometimes fail on actual use cases. Real feedback beats synthetic metrics.
6. Costs Compound
36,500/year. Cheap per-unit costs add up fast at scale. Optimize for total cost, not per-token cost. Monitor spend like you monitor quality.
7. Humans in the Loop
For high-stakes decisions (hiring, medical, legal), keep humans involved. AI amplifies good judgment and bad judgment equally. Let AI do the heavy lifting (analyze 1000 candidates), let humans decide.
8. Transparency About Limitations
Be honest: “This model hallucinates sometimes,” “This only works in English,” “We haven’t tested this on X.” Users who know limits make better decisions than users fooled by false precision. Honesty builds trust.
9. Validate, Then Validate Again
Run your system on examples you care about before deploying. Real examples, edge cases, adversarial inputs. Performance on a test set ≠ performance on real users. Validate early and often.
10. Version Your Prompts Like Code
Treat prompts as artifacts that evolve. “Version 1” → “Version 2 (better at math)” → “Version 3 (faster)”. Keep a history. Know what changed and why. Document successful patterns.
11. Logs Are Your Best Friend
Log what the model said, why it was used, what the user did next. Logs let you debug when things fail. “The model returned garbage” is useless. “The model returned [garbage] because the prompt was [prompt] given input [input]” is actionable.
12. Fail Gracefully
System is unsure? Say “I’m not sure” instead of confidently hallucinating. Model has no relevant documents? Say “I don’t have that information” instead of making it up. Uncertainty acknowledged beats false certainty.
13. Meet Users Where They Are
Don’t require users to learn AI jargon. “Increase randomness” is unclear. “Get more creative/varied responses” is clear. Design for users who don’t know what temperature is. Make the simple case work.
14. Explain the Output
“Here’s the answer” is less useful than “Here’s the answer because [reasoning].” Show your work. If using RAG, cite sources. If using multi-step reasoning, show the steps. Users trust transparent outputs.
15. Give Users Control
Let users adjust temperature, context size, model, retrieval depth. Different users have different needs. Reasonable defaults + power-user knobs = happy users at all levels.
16. Feedback is Data
When users say the output is wrong, don’t dismiss them. Collect that feedback. If 10 users report the same failure, that’s a signal to investigate. User complaints are free R&D.
17. Consider Harms Broadly
Building a hiring tool? Consider: bias in training data, feedback loops that amplify bias, which groups are harmed most. Consider not just accuracy but fairness, explainability, recourse for affected users.
18. Know Your Blindspots
You’re probably biased toward your own use case. Get feedback from people different from you. People with disabilities, different languages, different cultures catch problems you won’t. Diversity in testing prevents failures.
19. Deprecate Responsibly
If you’re sunsetting a service, give users time and tools to migrate. Don’t just turn it off. People rely on systems you build. Responsibility includes graceful exits.
20. Privacy ≠ Obscurity
Encrypting data in transit is table stakes, not the finish line. Know where data lives, who can access it, how long it’s kept. Assume breaches happen and design accordingly.
21. Iterate on What Works
If a prompt gets 90% accuracy, spend time on that 10%, not completely rewriting. If a system works well for 80% of users, understand the 20% before redesigning for everyone. Iterate from success.
22. Document Why, Not Just What
Code comments should explain WHY decisions were made, not WHAT the code does (the code itself says that). “We use Sonnet instead of Opus here because latency matters more than quality” is useful. “Calls Sonnet” is not.
23. Share Learnings
When you solve a hard problem, write it down. When you hit a dead end, document it. When you discover a pattern that works, tell someone. Knowledge multiplies when shared.
24. Admit When You're Wrong
Models are imperfect. Your understanding is imperfect. When something doesn’t work, debug honestly. “This approach failed because X, so we tried Y” is how you get better. Ego-driven debugging leads to nowhere.
25. Today's Hacks Become Tomorrow's Debt
Quick fix now = tech debt later. Document hacks. Plan to revisit them. Sometimes quick fixes are right (ship to validate). But don’t let them become the permanent architecture.
26. Tools Get Better; Rebuild Periodically
If you built your RAG system in 2024, revisit it in 2026. Better models, better embeddings, better retrieval strategies exist now. Dust off old projects. What didn’t work before might work now.
27. The Cost of NOT Doing Something is Real
Perfect system in 6 months < pretty-good system in 2 weeks. 50% accurate automation saves more work than 0% automation (manual work). Don’t let perfect be the enemy of good.