How I think about integrating AI into production systems.
Each design review starts with a realistic customer problem in a professional domain and works through a rigorous system design analysis. These are artifacts of a design process: the kind of reasoning I'd bring to a real architecture review. Architecture decisions depend on context — your stack, your team, your constraints. This is how I think through the problem, not how you should.
Every design review follows the same structure.
- System Context & Constraints — What's the environment? What are the stakes, scale, and latency budget?
- What I Would Not Do — Before designing anything, what's the trap? What looks appealing but breaks?
- Metrics & Success Criteria — What does "working" mean? Offline evaluation vs. online signals.
- Data Strategy — What data exists? Quality, lineage, feature engineering trade-offs.
- Architecture & Data Flow — Components, interfaces, token budgets, serving path.
- Failure Modes & Detection — How does this go wrong? How would you know? What fails silently?
- Mitigations & Deployment — Fallbacks, human-in-the-loop, A/B testing, canary rollouts, rollback strategy.
- Cost Model — Real numbers, not "it depends." What does this cost at scale?
- Security & Compliance — Data privacy, adversarial robustness, governance.
- What Would Change My Mind — Explicit uncertainty. Conditions under which this advice is wrong.
Every design review includes concrete artifacts: architecture diagrams, failure taxonomies, cost tables.
| # | Title | Domain | Date |
|---|---|---|---|
| 001 | AI-Assisted Ticket Triage & Agent Response Suggestion for B2B SaaS Support | Customer Support | 2026-02-16 |
| 002 | AI-Assisted Clinical Document Summarization for Hospital Admissions | Healthcare | 2026-02-23 |
| 003 | AI-Powered Semantic Product Search for a DTC E-Commerce Brand | Healthcare | 2026-03-01 |
- AI is a system component, not a feature. Features get bolted on; components get designed — with interfaces, budgets, failure modes, and observability.
- Production failures are more important than demo success. Demos hide decisions. Production exposes them.
- Silent failure is more dangerous than loud failure. A system that fails visibly can be fixed. A system that fails silently erodes trust invisibly.
- Engineering judgment matters more as systems become probabilistic. Deterministic systems can be tested exhaustively. Probabilistic systems require judgment about acceptable risk.
- Data quality determines system ceiling. No architecture, model, or prompt strategy compensates for upstream data problems.
- Technical debt in AI systems compounds silently. It hides in data dependencies, feedback loops, and pipeline complexity — invisible until it paralyzes you.
- Judgment must be local to constraints. No universal claims about "the industry" or "most teams." Every refusal is scoped to a specific system context.
- Data is foundational, not incidental. Source, quality, lineage, and privacy are design decisions — not afterthoughts.
- Build and Run are distinct disciplines. Building AI systems and running AI systems require different skills, artifacts, and failure mode awareness.
- AI systems require continuous learning. Build, Run, Learn is a loop, not a sequence. Systems that don't learn from production feedback degrade.
These beliefs are expanded in my Mental Models for Production AI series.
Have a different take? See a failure mode I missed? Think the cost model is off?
Open an issue — I'd like to hear how you'd approach it differently.
This work is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).
You are free to share and adapt this material for any purpose, including commercial, as long as you give appropriate credit.