-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathforge.json
More file actions
221 lines (221 loc) · 323 KB
/
forge.json
File metadata and controls
221 lines (221 loc) · 323 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
{
"name": "Forge",
"version": "1.0.0",
"description": "A system for keeping your AI on track. Write specs precise enough that AI agents build what you want without guessing. Works for solo developers, teams, and organizations. Works for code, policy, legal, and anything else where precision matters.",
"author": "Larry Diffey",
"website": {
"concepts": {
"vocabulary-precision": {
"title": "Vocabulary Precision",
"subtitle": "Every word is a constraint on what the model produces",
"body": "## The Moment You'll Feel This\n\nYou tell the AI \"the system should handle errors gracefully.\" It builds something. You deploy. A week later, you discover that \"gracefully\" meant something different to the AI than it did to you. To you, it meant \"show the user a friendly message and retry.\" To the AI, it meant \"catch the exception and continue silently.\" The bug was invisible because the code did exactly what \"gracefully\" means in its training data. Not what you meant.\n\nThat's what imprecise language costs. Not a syntax error. A silent architectural disagreement that looks correct until it isn't.\n\n## How It Works\n\nLLMs are probabilistic. They sample from a distribution shaped by your input. The more precise your language, the tighter that distribution, the more accurate the output. This isn't abstract theory. It's the difference between an agent that asks the right questions and one that silently makes wrong assumptions across hundreds of decisions.\n\n## How It Works\n\nUse words that name the **mechanism** and the **property**, not just the outcome. Two words, \"interstitial rollback,\" communicated a concept that would otherwise take multiple sentences: an edit-level snapshot system that captures state between edits, not just at explicit save points. The model understood immediately. That's vocabulary precision doing its job.\n\n| Instead of... | Use... | Why |\n|---|---|---|\n| \"separate things so they don't affect each other\" | \"fault containment via process isolation\" | Names what you're actually doing and why |\n| \"the system handles different sources\" | \"source-agnostic pipeline\" | Tells the agent not to branch on source type |\n| \"we check if things are working\" | \"heartbeat-based liveness detection\" | Names the mechanism, not the vibe |\n| \"it should be fast\" | \"hot-path latency budget: sub-millisecond\" | Quantifies it. Now it's testable. |\n| \"keep it safe\" | \"capability-based security at the runtime boundary\" | Says where and how, not just \"be safe\" |\n\nThis isn't about using big words for their own sake. It's about finding the word that carries the most information per token for the concept you're trying to constrain. Sometimes that's a technical term. Sometimes it's a simple word used precisely. The skill is knowing which words narrow the distribution and which ones leave it wide open.\n\n## Where It Applies\n\n**In WHY blocks:** \"Fault containment via process isolation\" constrains understanding far more than \"we put it in a separate process so crashes don't break everything.\"\n\n**In architecture docs:** \"Pixel-source-agnostic data plane\" is five words that prevent an agent from writing branching logic based on source type. Try getting that across with casual language.\n\n**In prompts:** \"Evaluate the shared-fate failure modes introduced by co-locating capture and stream assembly\" gets you targeted analysis. \"Evaluate the failure modes\" gets you a book report.\n\n## The Compound Effect\n\nThis isn't a one-time optimization. Every session that reads the spec benefits from precise language. Across a corpus of hundreds of pages read by multiple models across multiple validation rounds, the difference between precise and casual language is enormous. Fewer correction cycles, less wasted tokens, more stable outcomes. It compounds."
},
"decision-tracking": {
"title": "Decision Tracking",
"subtitle": "If the agent can't tell what's decided, it guesses",
"body": "## The Moment You'll Feel This\n\nYou start a new AI session on Monday. You ask it to add a feature to the auth module. It restructures something you decided on three weeks ago because it doesn't know that decision was final. You catch it in code review, revert, re-explain, lose an afternoon. Next week, a different session reopens a question you resolved in March. Nobody remembers why you chose that approach. The reasoning is gone. You make the same decision again, or worse, a different one. Your codebase starts disagreeing with itself.\n\n## The System\n\nDecisions pile up across dozens of sessions. Without explicit tracking, nobody knows which decisions are final, which are still open, which were open but got resolved. An agent encountering an unresolved question in the docs has two choices: guess or stop. Guessing is the failure mode this entire methodology exists to prevent.\n\n## The System\n\nEvery architecture domain doc gets two sections at the end:\n\n**Decided:** numbered list of finalized decisions. Each one has a title, description, rationale (WHY block where alternatives exist), and a reference back to the session where it was decided.\n\n**Open Questions:** numbered list of unresolved questions. Each one carries a status:\n\n- `Discovery: Phase N`: will be answered during implementation\n- `Post-V1`: explicitly deferred, not relevant right now\n- `RESOLVED: Decision N`: answered, linked to the decision that resolved it\n- `Blocked on: [dependency]`: can't be answered until something else is decided\n\nOpen questions are not defects. They're the spec being honest about what it doesn't know yet. Some things genuinely can't be decided until you're in the code. \"Which serialization library is faster for our workload\" is a benchmark question, not a spec question. Forcing a decision there just produces a guess wearing a suit.\n\n## Inline Ambiguity Markers\n\nThe Open Questions section lives at the end of the doc. The ambiguity those questions represent lives at specific points in the spec body. `[OQ-N]` markers bridge that gap:\n\n```markdown\nThe observer fires a foundation model call [OQ-7] with compressed context...\n```\n\nThe agent hits `[OQ-7]`, checks the Open Questions section, sees \"Discovery: Phase 4,\" and knows not to implement this yet. Without the marker, it reads a paragraph that sounds definitive and builds on an unresolved assumption.\n\n## The Maturity Signal\n\nThe Decided/Open ratio tells you how ready a domain is:\n\n| Decided | Open | What it means |\n|---|---|---|\n| 45 | 2 | Ready. Ship it. |\n| 63 | 14 | Mostly there. Check the open ones. |\n| 15 | 22 | Not ready. Don't build from this yet. |\n\nA domain with 22 open questions isn't ready, no matter how confident the prose sounds."
},
"why-blocks": {
"title": "WHY Blocks",
"subtitle": "Stop agents from \"optimizing\" away your constraints",
"body": "## The Moment You'll Feel This\n\nYou spent two days figuring out that a certain component needs to run in a separate process. You have good reasons. You move on. A month later, a fresh AI session reads your spec and \"optimizes\" it by co-locating that component with the main process. Less latency, cleaner architecture. The PR looks great. The optimization is technically sound. You merge it. Two weeks later, a crash in that component takes down your entire system instead of just one stream. The separation existed for crash isolation. The AI didn't know. You didn't write it down. The \"optimization\" was a regression disguised as an improvement.\n\n## The Problem\n\nAgents optimize. That's what they do. Without rationale, they'll \"improve\" your design by removing constraints they don't understand. Your spec says capture runs in a separate process. The agent sees an obvious optimization: co-locate it with the stream assembler, eliminate an IPC hop. Less latency, cleaner architecture.\n\nExcept it's wrong. The separation exists for crash isolation. Platform capture APIs are the most likely crash source. If they crash in the data plane, every stream dies. In a separate process, only that one stream dies. The IPC cost is microseconds. The crash isolation is worth it.\n\nWithout the WHY block, the optimization looks correct. With it, it's obviously wrong. The agent has no way to know the difference unless you tell it.\n\n## What They Look Like\n\n```markdown\n47. **Capture runs in the wrapper process, not the data plane.**\n - **WHY:** Crash isolation. Platform capture APIs are the most\n likely crash source. Failure in data plane = shared fate.\n Failure in wrapper = isolated.\n - **WHY NOT:** Data plane co-location (eliminates IPC hop but\n creates shared-fate failure across all capture sources).\n```\n\nThe **WHY NOT** is just as important. An agent that doesn't know you considered and rejected an alternative may independently arrive at the same idea.\n\n## When to Add Them\n\nNot everywhere. The test: would a competent agent, reading only this decision without the WHY block, plausibly choose differently? If yes, add one. If no, skip it.\n\n\"Go uses gofmt\" needs no WHY. There's no alternative. \"Capture runs in the wrapper process\" absolutely needs one because co-location is a reasonable alternative that happens to be wrong for this system.\n\n100% coverage is the wrong goal. You'd end up writing `**WHY:** N/A` on self-evident decisions, which just trains agents to ignore WHY blocks.\n\n## The Feedback Loop\n\nThis is where it gets good. An agent guesses wrong during implementation. The guess goes into FINDINGS.md. You review it. If the spec should have prevented that guess, the wrong guess becomes a WHY block. Next agent doesn't guess on that one. WHY coverage grows organically from real mistakes, not from trying to predict every possible wrong choice upfront."
},
"tonic-errors": {
"title": "TONIC Errors",
"subtitle": "When the agent picks the popular choice instead of yours",
"body": "## The Moment You'll Feel This\n\nYou specified protobuf for inter-service communication. The AI sees \"protobuf\" and installs a gRPC framework, because protobuf and gRPC go together in 90% of the projects in its training data. Your project doesn't use gRPC. You use raw protobuf serialization over Unix sockets. But the AI didn't know that because you only said what to use, not what NOT to use. You discover the mistake three days later when nothing connects. The agent's choice was reasonable. It was also wrong. And it was entirely preventable.\n\n## The Pattern\n\nTONIC: Technically Obvious, Not Intended Choice. It's when an agent makes a technically reasonable decision that happens to be wrong for your project because the docs didn't explicitly forbid the default.\n\n## The Original Sin\n\nThe spec says two subsystems communicate via protobuf over Unix domain sockets. An agent sees \"protobuf\" and reaches for `tonic`, the standard Rust gRPC library. Makes sense. Protobuf and gRPC go together like peanut butter and jelly.\n\nExcept the Rust side doesn't use gRPC at all. It uses raw `prost` serialization over UDS. No HTTP/2. No gRPC framing. The agent's choice is reasonable given what it knows about the Rust ecosystem, but it violates the project's specific decision.\n\nThis class of error is so common it got a name.\n\n## The Pattern\n\nTONIC errors happen whenever:\n1. A reasonable default exists in the ecosystem\n2. Your project made a specific, non-default choice for good reasons\n3. The docs state the correct choice but don't forbid the default\n4. An agent trained on the whole internet gravitates toward the default\n\n## The Fix\n\nState what to use AND what NOT to use:\n\n```\nRust uses prost only for protobuf. Do NOT use tonic or any gRPC library.\nWHY: No gRPC on the Rust side; UDS framing only. gRPC adds HTTP/2 overhead.\n```\n\nThis goes in the conventions doc. Every canonical dependency should list its forbidden alternatives. If you chose `chi` over `gin` for your Go HTTP router, say so. If you picked `pgx` instead of `gorm`, say so and say why. The agent doesn't need to agree with your reasoning. It needs to follow your decision.\n\n## The TONIC Table\n\nIn practice, the most effective way to prevent TONIC errors is a dedicated table in your conventions doc. Three columns: Correct Choice, TONIC Risk (what the agent will reach for), and Why We Chose Differently:\n\n| Correct | TONIC Risk | Why |\n|---|---|---|\n| `prost` (raw protobuf) | `tonic` (gRPC) | No gRPC on this side; UDS framing only |\n| Direct API calls | LangChain | Scoped role doesn't need orchestration framework |\n| `chi` router | `gin` | chi is stdlib-compatible; gin uses custom context |\n| CatBoost | XGBoost | Native categorical feature handling without encoding |\n\nEach entry reads like a preemptive conversation: \"I know you're going to reach for this. Don't. Here's why.\" The specificity is what makes it work. These aren't generic warnings; they're predictions of exact mistakes an agent will make based on its training data.\n\nIn real projects, the TONIC table grows organically. You start with the choices you know will trip up agents. Cold validation runs surface more. Cold code runs surface even more. The table ends up with 15-25 entries and becomes one of the most referenced artifacts in the spec."
},
"constitution": {
"title": "The Constitution",
"subtitle": "What your system cannot do, period",
"body": "## The Moment You'll Feel This\n\nSomeone on the team, or an AI session, writes a shortcut that bypasses your single write path. Data goes directly into a secondary store instead of through the pipeline. Everything works fine in dev. In production, the primary store and the secondary store disagree. You don't discover it for two weeks because nothing failed. Nothing errored. The data just silently diverged. The assumption that all writes go through the pipeline was load-bearing. It wasn't in the conventions doc because it's bigger than a convention. It's an architectural law. Without a constitution, it was invisible.\n\n## What It Is\n\nConventions tell an agent how to write code: naming patterns, library choices, error handling idioms. The constitution tells it what the system **cannot do**, regardless of how clever the optimization looks.\n\nThis distinction came from practice. Conventions change. Libraries get swapped, patterns evolve, async strategies shift. But certain architectural principles are load-bearing. Violating them doesn't just produce inconsistent code; it undermines the structural integrity of everything.\n\n## It's Derived, Not Constructed\n\nYou don't sit down on day one and write a constitution. It emerges from your design discussions. As you ideate and formalize, certain principles reveal themselves as load-bearing. They keep coming up. Multiple decisions depend on them. Violating them would break everything, not just one module. When you notice that pattern, you extract the principle into the constitution.\n\nThe constitution is a trailing signal, not a leading document. You discover your immutable laws; you don't invent them upfront.\n\n## What Goes In\n\nConstitutional articles are hard boundaries. Not best practices. Not aspirational goals. If crossing this boundary would require redesigning multiple subsystems, it's constitutional. If it would just produce a local bug, it belongs in conventions or as a WHY block.\n\n```markdown\n## Article 1: Source of Truth\nClickHouse is the sole source of truth for historical data. All writes\nenter through the pipeline (Kafka > Consumer > ClickHouse). No component\nwrites directly to secondary stores (Weaviate, Neo4j). Secondary stores\nare derived views, rebuildable from ClickHouse at any time.\n\n## Article 2: Observer Independence\nObserver agents cannot spawn other agents. No agent-to-agent communication\nloops. Each observer deposits typed observations into a shared substrate.\nThis is architecturally incompatible with unbounded agent cascades.\n```\n\n## Document Exceptions Inline\n\nWhen an immutable law has a legitimate exception, document it right there with the article, not in a footnote or separate document. If Article 3 says \"all state is persistent\" but your WASM build is stateless by design, put that exception inline with the rationale. An agent checking compliance sees the exception immediately instead of violating the article and then discovering the exception exists somewhere else.\n\n## Keep It Short\n\n10-15 articles max. If your constitution is growing past that, you're absorbing decisions that belong in domain documents. The power of the constitution is brevity. An agent reads the whole thing in seconds and carries it as context for every subsequent decision.\n\n## The Hierarchy\n\n```\nConstitution What the system CANNOT do\n |\nConventions HOW to write the code\n |\nDomain Docs What the system IS\n |\nEngineering Plan Build sequence and phases\n```\n\nWhen documents conflict, higher wins. Always."
},
"conventions": {
"title": "Conventions",
"subtitle": "The single highest-ROI artifact you can create",
"body": "## The Moment You'll Feel This\n\nYou have three AI sessions running across a week. Monday's session uses Express. Wednesday's uses Fastify because it's \"better.\" Friday's session introduces Koa because it saw it in a tutorial. Your codebase now has three HTTP frameworks. None of them are wrong individually. Together they're a maintenance nightmare. Or: you told the AI to use PostgreSQL. It used pg for the driver. The next session used Prisma. The third used Sequelize. Same database, three different access patterns, three different migration strategies, incompatible query styles everywhere.\n\n## The Fix\n\nIf you adopt one thing from Forge, make it a conventions doc. This is where WHY blocks prevent wrong architectural decisions; conventions prevent inconsistent implementation decisions. Lock your choices so every AI session follows the same rules.\n\n## What to Cover\n\n- **Canonical dependencies:** Not \"use a UUID library\" but \"use `uuid` crate with `v7` feature; do not substitute.\" Every entry lists the forbidden alternatives and why.\n- **Language conventions:** Naming, error handling, module structure, async patterns.\n- **Contested idioms:** Every language has areas where experienced developers disagree. Go's panic policy, Rust's unwrap tolerance, Python's class vs dataclass. If two experienced devs would argue about it, your conventions doc takes a side. With a WHY block.\n- **Cross-language conventions:** IPC format, timestamp representation, UUID format, error code taxonomy.\n- **Terminology enforcement:** A table mapping your project's terms to forbidden synonyms. Without this, one model calls it a \"Widget\" and another calls it a \"Component\" and you get syntactically incompatible code.\n\n## Terminology Enforcement\n\nThis one's easy to overlook and surprisingly impactful:\n\n| Correct | Don't Use |\n|---|---|\n| Contract | Agreement, Deal, SLA |\n| Quark | Widget, Component, Module |\n\nWithout this table, Claude might call it a Widget while ChatGPT calls it a Component. The code means the same thing but the identifiers don't match. Across a large codebase, that's a nightmare.\n\n## Why It Works Immediately\n\nDrop a conventions doc into your project. Point your AI at it before the next session. Consistency improves on the first run. No graph, no cold validation, no multi-model testing needed. Just a list of choices that every session reads and follows. That's why it's the highest-ROI artifact."
},
"cold-validation": {
"title": "Cold Validation",
"subtitle": "The core mechanism that makes everything else work",
"body": "Cold validation is the heart of Forge. Everything else exists to support this: a fresh AI session with no prior context reads your spec and tries to act on it. Every question it asks is proof the spec is ambiguous. Every guess it makes is a defect you can fix.\n\n## Why \"Cold\"\n\nThe cold session has never seen your conversations, your reasoning, your design discussions. It knows only what the documents say. That's the point. It simulates the actual use case: a coding agent encountering your spec for the first time.\n\nYou, sitting in your hot session with all that context, can't see the ambiguity. You know what you meant. The cold session doesn't. It only knows what you wrote. When it asks a question, that's the gap between what you meant and what you said.\n\n## The Hot/Cold Feedback Loop\n\nThis is how it actually works in practice:\n\n1. Cold session reads the spec, starts producing a plan, asks questions inline\n2. You take those questions to the hot session (the one with all your context)\n3. The hot session answers the questions AND updates the docs\n4. You paste the answers back to the cold session\n5. It continues. More questions. More fixes.\n6. Repeat until the questions dry up\n\nThe plan the cold session produces is disposable. You throw it away. The value is in the questions. Every question that surfaces a doc fix is the methodology working.\n\n## The Rule You'll Fight About\n\n**A cold question is proof of a doc defect.** If the cold session asked, the docs failed to communicate.\n\nThe hot session will resist this. It has full context and thinks the answer is obvious. \"The docs already say that.\" \"Any reasonable reader would understand.\" But the cold session is a reasonable reader, and it didn't understand. That's the proof.\n\nThis is the hardest habit to build. The hot session genuinely believes it doesn't need a doc fix. It does. Every time. For coding projects aiming for determinism, this is absolute.\n\n## Cross-Model\n\nAfter one model stops asking questions, do the whole thing again with a different model. Different architectures have different blind spots. Claude finds things ChatGPT misses and vice versa. Where both models converge on the same interpretation, your spec is clear. Where they diverge, it's still ambiguous.\n\n## The Convergence Signal\n\nAs you iterate, the questions change character. Early rounds hit you with big architectural problems: \"your auth model contradicts your data flow.\" After a few rounds of fixes, the questions narrow: \"should this timeout be 30 or 60 seconds?\" That narrowing is the convergence signal. The architecture is holding. The model is picking at edges, not questioning foundations.\n\nWhen you go from \"your pipeline has a fundamental consistency problem\" to \"what's the exact retry backoff interval,\" that's a huge relief. The hard problems are solved. What's left is precision work.\n\nYou're done when the cold session reads your docs and just builds. No questions. No confusion. Multiple models, same result."
},
"cold-code-runs": {
"title": "Cold Code Runs",
"subtitle": "The code is a probe, not a product",
"body": "A cold code run is the same idea as cold validation, pushed one step further. Instead of producing a plan, the agent builds the project. The code is disposable. You throw it away. The value is in what the code reveals about the spec.\n\n## What You're Looking For\n\nAfter the code run, you evaluate it against the spec:\n\n- Did it use the right libraries from the conventions doc?\n- Did it reach for a forbidden alternative (TONIC error)?\n- Did it \"optimize\" away something protected by a WHY block?\n- Did it guess at implementation details the spec should have specified?\n- Did it violate the constitution?\n\nEvery deviation falls into one of three buckets:\n\n**Spec defect:** The agent guessed because the spec didn't specify. This is the most common and most valuable finding. Fix the spec.\n\n**Model error:** The agent ignored a clear instruction. Less common, less actionable. But if a model consistently misreads a particular phrasing, rewording helps.\n\n**Platform blocker:** Can't compile for Windows on Linux. Skip it and note it. Not a methodology problem.\n\n## The Feedback Loop\n\nFix the docs based on what the code run revealed. If the changes were significant, re-run the structural validation and maybe another cold validation round. Then run the code again with a different model.\n\nDifferent models make different mistakes. Claude might honor WHY blocks perfectly but reach for the wrong library. Codex might nail the libraries but restructure your module layout. The cross-model comparison surfaces the last layer of spec ambiguity.\n\n## Why Throw Away the Code?\n\nBecause the code you keep should come from a spec that's already been validated by everything you're going to throw away. Every disposable code run fixes the spec. The final implementation benefits from all of those fixes. The code you ship was never guessed on, never ambiguous, never built from an unclear spec. That's the payoff."
},
"cascading-consistency": {
"title": "Cascading Consistency",
"subtitle": "One stale cross-reference, one module built wrong",
"body": "Architecture documents aren't independent. They form a web of cross-references. A decision in one domain has implications across multiple others. When you change something in one document and forget to update the others, you get an inconsistent spec. An inconsistent spec gives conflicting instructions depending on which document the agent reads first.\n\n## What Happens Without It\n\nYou decide to change how two services communicate. You update the domain doc for Service A. You forget to update the domain docs for Services B, C, and D that reference the same communication protocol. Now Service A's doc says gRPC and Service D's doc says REST. An agent reading Service D's doc builds a REST client. Another agent reading Service A's doc builds a gRPC server. Nothing works.\n\nThis actually happened. A single decision about wrapper component behavior triggered updates across 7 documents, 15 individual fixes. Without systematic cascading, those inconsistencies would have been baked into the code.\n\n## Three Levels\n\nPick what fits your project:\n\n**Manual (up to 5 domains):** After each change, mentally trace the dependencies and update affected docs. You can hold this in your head at small scale.\n\n**Dependency matrix (up to 15 domains):** A document that tracks what depends on what. Inline cascade tags in your domain docs flag the impact points:\n\n```markdown\n### Contract Structure\n**Cascade: Domains 01, 04, 06, 07, 09, 11, 12, 13**\n```\n\n**Dependency graph (15+ domains):** At this scale, manual checking misses things every time. A graph makes dependencies queryable: \"If I change Domain 5, what breaks?\" becomes a query with a definitive answer instead of a best guess.\n\n## The Principle\n\nAfter every change, propagate across all cross-references. The tooling scales with the project, but the principle is the same at every size."
},
"multi-model-validation": {
"title": "Multi-Model Validation",
"subtitle": "Different blind spots surface different gaps",
"body": "A single model validates against its own biases. Two models reading the same spec exploit cognitive diversity. It's the same reason code review works better with multiple reviewers.\n\n## How It Works\n\nTwo (or more) independent models read the same spec. They each produce implementation plans or ask questions. Where they agree, the spec is clear. Where they disagree, the spec is ambiguous. The disagreement is the signal.\n\nIn practice: Claude reads the spec and produces a plan. Then Codex/GPT reads the same spec independently. You compare. Structural differences (different crate organization, different error handling, different library choices) reveal spec ambiguity. Fix the spec, run again. When both models produce the same architecture from the same docs, the spec is unambiguous.\n\n## What Different Models Catch\n\nThis isn't theoretical. In actual multi-model validation:\n\n- One model excelled at internal consistency and structural completeness\n- The other caught higher-level reasoning about system boundaries and contradictions across documents\n- One found that old discussion language hadn't been fully superseded in the architecture docs\n- The other never thought to check discussions against specs\n\nNeither model found everything. Together they found more than either alone.\n\n## When to Bother\n\nFor a side project or a small app, a single cold session validation is probably enough. Multi-model validation pays off when the cost of an architectural error is high: large systems, anything with compliance requirements, anything where rebuilding isn't cheap. If wrong architecture costs you weeks, spending an extra session with a different model is trivially cheap insurance."
},
"edge-cases": {
"title": "Edge Case Analysis",
"subtitle": "What breaks at the seams between domains",
"body": "Edge cases don't come from individual components. They come from the interactions between components. A user lifecycle state machine works fine in isolation. Combined with order processing, payment callbacks, and event timing, a dozen edge cases appear at the seams.\n\n## Response Levels\n\nNot every edge case needs code. Classify them:\n\n| Level | When | What to do |\n|---|---|---|\n| **CODE** | Common (>1%), predictable | Handle it. It's an exit criterion. |\n| **DEGRADE** | Uncommon (<1%), detectable | Degrade gracefully. Monitor. |\n| **ALERT** | Rare, detectable | Log it, alert someone, don't crash |\n| **ACCEPT** | Theoretical, impractical to prevent | Document it. Don't code for it. |\n\nThe key insight: don't add hot-path latency for ACCEPT-level scenarios occurring less than once per million operations. Document them, move on.\n\n## The Process\n\n1. Find the cross-domain interaction surfaces (the graph helps at scale, but you can do this manually)\n2. For each surface: what goes wrong when timing, resources, state, or data are unexpected?\n3. Classify by response level\n4. Index each edge case to affected domains\n5. CODE-level cases become exit criteria for implementation\n\n## For Existing Codebases\n\nLoad the code, load the edge cases, query for gaps: which CODE-level edge cases have no error handling? One-time audit. Fix the gaps. Done."
},
"dependency-graph": {
"title": "Dependency Graph",
"subtitle": "What breaks if I change this?",
"body": "At 15+ architecture domains, manual cascade checking fails. You will miss things. Not because you're careless, but because 18 domains have 306 potential pairwise relationships and no human tracks that.\n\nThe graph makes it queryable. \"If I change Domain 5, what else needs updating?\" is a query that returns in seconds, not a mental exercise that misses half the answer.\n\n## What Goes In\n\n```\nNodes: Domain, Decision, Open Question, Concept\nEdges: DECIDED_IN, OPEN_IN, REFERENCES_CONCEPT,\n REFERENCES (domain-to-domain), CASCADES_TO,\n OWNED_BY, REFERENCED_BY\n```\n\nThe docs are the source of truth. The graph is derived. It gets rebuilt from docs whenever they change.\n\n## What It Answers\n\n- **Cascade analysis:** \"Change Domain 5, what breaks?\"\n- **Concept impact:** \"Which specific decisions reference 'Contract'?\"\n- **Gap detection:** \"Which high-use concepts aren't tracked?\"\n- **Untagged dependencies:** \"Which domain pairs reference each other but lack cascade tags?\"\n- **Validation:** Decision count mismatches, WHY block coverage, orphaned domains\n\n## When You Need It\n\n- **5 domains:** Nice to have. You can still do this in your head.\n- **10 domains:** It catches things you miss.\n- **15+ domains:** You need it. Full stop.\n\n## It's Optional Tooling\n\nThe methodology is the feedback loop. The graph accelerates it. You can use Neo4j, you can use an in-memory Python graph library, you can use a whiteboard. The principle is the same: make dependencies queryable, don't rely on memory."
},
"semantic-grooming": {
"title": "Semantic Grooming",
"subtitle": "Making docs machine-readable without changing what they say",
"body": "Architecture docs accumulate formatting inconsistencies as they evolve. WHY blocks appear as `**WHY:**`, `WHY:`, `> **WHY**`, `Rationale:`, or just buried inline. Concept references use the canonical term in one doc and a synonym in another. Cross-references use three different formats.\n\nHumans can read through the inconsistency. Parsers can't. And if you're using graph tooling, the graph is only as good as the parser's ability to find things.\n\n## What Grooming Does\n\n- **WHY block normalization:** Consistent `**WHY:**` prefix everywhere\n- **Concept tagging:** Canonical glossary terms used consistently\n- **Cross-reference standardization:** \"Domain NN, Decision NN\" format everywhere\n- **Cascade tag verification:** Every section that defines a widely-referenced concept gets a cascade tag\n\n## What It Doesn't Do\n\nIt does not add decisions, change architecture, or modify content. It makes existing content findable. The doc reads the same before and after grooming; the parser just finds everything it's supposed to.\n\n## When to Groom\n\nAfter any major addition (new domain, new cross-cutting concern) and before any implementation run that depends on graph validation. If you're not using the graph, grooming still helps. Consistent formatting makes cold sessions more reliable too."
},
"spec-drift": {
"title": "Spec Drift",
"subtitle": "The spec will get out of sync with reality. That's fine.",
"body": "Once code is in production, changes come from both directions. Features and refactors are spec-driven; they go through the pipeline. Bug fixes, performance patches, production fires are code-driven; they don't. The spec drifts. This is normal and inevitable.\n\nNo team has ever validated every fix against the spec in real-time. The exceptions are environments like defense contracting where a single change costs six figures and ships a year later. That's not a model for anyone else.\n\n## Batch Reconciliation\n\nDocument fixes normally. Commits, PRs, tickets, whatever you already do. Periodically, batch-assess accumulated changes against the spec. An AI agent reads the spec and the list of changes since the last reconciliation. It finds where the code now contradicts the spec: violated assumptions, invalidated WHY blocks, decisions that no longer reflect reality.\n\nThe output: a list of spec updates needed. New Decided entries, updated WHY blocks, resolved Open Questions, new TONIC entries discovered in practice.\n\n## What to Skip\n\nTrivial changes don't need reconciliation. The test: did this change alter behavior that the spec describes? If no, skip it. Copy fixes, cosmetic UI updates, dependency patches with no behavioral change. Don't waste time reconciling things that don't matter.\n\n## Why This Works Now\n\nTraditional teams never reconcile specs because it's too expensive. Reading hundreds of pages against months of changes is work nobody volunteers for. With a machine-readable spec and an AI agent, it's a session. The spec was written for machines to read. Let machines check it."
},
"semantic-review": {
"title": "Semantic Review",
"subtitle": "Catching the words that make agents guess",
"body": "## The Moment You'll Feel This\n\nYou run a cold validation. The cold session asks 30 questions. You spend a day fixing docs and answering them. Half of the questions were about language, not architecture. \"This says 'should.' Is that required or optional?\" \"You wrote 'handle appropriately.' What does appropriate mean here?\" \"This says 'periodically.' How often?\" You realize you burned cold session tokens on problems a text search could have caught. The architecture was fine. The words were sloppy.\n\n## What It Catches\n\nThe graph catches structural problems: orphaned domains, phase conflicts, missing WHY blocks. The semantic review catches something else entirely: imprecise language that widens the probability distribution for any agent reading the spec.\n\nWords like \"should,\" \"appropriate,\" \"handle gracefully,\" \"as needed.\" Anything where two reasonable agents would interpret the same sentence differently.\n\n## How It Works\n\nScan your spec's binding contexts (Decided sections, conventions, spec body, constitutional articles) against an ambiguous language dictionary. Flag occurrences. Evaluate in context. Fix what's genuinely ambiguous.\n\nSkip Open Questions (ambiguity is expected there), WHY rationale text (explaining reasoning is fine), and code blocks.\n\n## The Dictionary\n\n31 categories. Hundreds of terms. Hedge words, weak modals, vague quantifiers, subjective adjectives, ambiguous temporal words, false-certainty words, scope-escape phrases, passive voice that hides actors, and more. It's comprehensive enough that running it against your spec will produce findings even on docs you thought were tight.\n\n## The \"Unit Test\" Test\n\nIf a requirement can't be written as a failing test case before the code is written, the language is almost certainly ambiguous. You can't write a test for \"the UI should feel snappy.\" You can write one for \"InteractionToNextPaint must be under 200ms.\" If you can't express it as a test assertion, rewrite the requirement until you can.\n\n## Flag, Don't Auto-Reject\n\nContext matters. \"Should\" is precisely defined in RFC 2119 but vague everywhere else. \"Approximately\" is fine when paired with a tolerance. The review flags occurrences for human evaluation, not automatic rejection."
},
"backforging": {
"title": "BackForging",
"subtitle": "Reverse-engineering a spec from code that already exists",
"body": "## The Moment You'll Feel This\n\nYou inherit a codebase. Or you look at your own project from six months ago. There's a function that does something unusual. You can see what it does. You can't see why. Was this a deliberate choice? A workaround for a bug that's since been fixed? A mistake nobody noticed? You don't know. The person who wrote it doesn't remember. The git blame leads to a commit message that says \"refactor.\" You have to decide: change it and risk breaking something, or leave it and carry the mystery forward. Multiply that by a hundred decisions across a codebase and you understand why projects calcify. Nobody changes anything because nobody knows what's load-bearing.\n\n## The Key Reframe\n\nYou have a codebase. Maybe you built it, maybe you inherited it, maybe it was vibe-coded and it's getting unwieldy. You want to bring it under Forge discipline without rebuilding from scratch.\n\n## The Key Reframe\n\nThe WHY blocks you write during BackForging aren't archaeology. You're not trying to reconstruct what the original developer was thinking. You're answering a different question: **why should this continue to be this way?**\n\nEven if it's your own project from six months ago, you've forgotten most of the \"why.\" That's fine. The BackForging agent doesn't know either. Neither of you is doing history. Both of you are evaluating whether each decision has a good reason to persist right now.\n\nIf it can be justified, it's a WHY block. If it can't, it's an Open Question for the rebuild. Maybe it was right once and circumstances changed. Maybe it was never right. Either way, the process surfaced it.\n\n## The Process\n\n1. Agent reads the codebase, produces draft Forge artifacts (constitution, conventions, domain docs, glossary)\n2. You review, fill in WHY blocks the agent couldn't infer, challenge assumptions\n3. Cold validation against the BackForged spec. Does the cold session's understanding match the actual system?\n4. Where it diverges: either the spec is wrong (fix it) or the code has a problem (flag it for the rebuild)\n5. Iterate until the spec can faithfully describe the system's intent\n\n## The Hidden Value\n\nThe most valuable output isn't the spec itself. It's the decisions that can't be justified. Every unjustifiable decision is accidental complexity, a stale choice, cargo cult code, or a genuine open question. All four become Open Questions that a rebuild can address consciously instead of inheriting blindly.\n\n## Business Logic Audit\n\nA codebase carries business rules that nobody questions because they're buried in if-statements. The approval chain that requires three sign-offs. The pricing calculation that applies a discount on Tuesdays. BackForging externalizes them. Once they're visible as Decided entries with WHY blocks, the people who should be evaluating them actually can.\n\n**Maturity note:** The BackForging protocol is reasoned extrapolation from Forge principles, not battle-tested process like the rest of the methodology. It's sound in theory. Expect to refine it through practice."
}
},
"concept_groups": {
"properties": [
"vocabulary-precision",
"decision-tracking",
"why-blocks",
"tonic-errors",
"constitution",
"conventions",
"semantic-review",
"backforging"
],
"methods": [
"cold-validation",
"cold-code-runs",
"cascading-consistency",
"multi-model-validation",
"edge-cases",
"dependency-graph",
"semantic-grooming",
"spec-drift"
]
},
"guides": {
"greenfield": {
"title": "Starting Fresh",
"subtitle": "Building a new project with Forge from the first conversation",
"body": "Starting fresh is the best case for Forge. No existing code, no accumulated assumptions, no spec debt. The spec gets written before the code, and the code is generated from the spec.\n\n## How It Actually Works\n\nForget the image of sitting down and writing a 300-page spec before touching code. That's not how this works. You have conversations. You discuss ideas with AI models. You explore the problem space. You argue about approaches. At natural breakpoints, when a topic reaches a conclusion, you tell the model: \"Forge it.\"\n\nThat's it. Two words. The model has the methodology loaded and knows what that means: take the current state of the conversation and formalize it into architecture docs with Decided entries, Open Questions, WHY blocks, cascade tags, the works. You don't need to explain how. The methodology is the instruction set.\n\nThen you keep going. More ideation, more \"Forge it\" at the next breakpoint. The spec builds at the speed of conversation.\n\n## The Sequence\n\n### 1. Conversations First\n\nStart with design discussions. Capture them verbatim, not summarized. You lose nuance when you summarize, and six months from now you'll want the nuance back. Tools, markdown files, voice transcripts, whatever works. Just save the full exchange.\n\n### 2. Read Everything\n\nBefore formalizing, re-read your conversations. You're about to be the authority on this project. Take notes. Start new discussions on things that are unclear or wrong. This is not where you want to be lazy.\n\n### 3. \"Forge It\"\n\nTell the model to create the architecture docs. It produces domain docs, constitution, conventions, glossary. This is interactive: you review, challenge, have real-time conversations to clarify. \"In this section we're talking about X but won't that cause Y?\" might change the architecture. Good. That's the process.\n\n### 4. Validate\n\nOnce the spec is in rough draft state, run it through structural validation (graph analysis and dictionary lint if you're using them). Then start cold validation sessions: fresh AI session, no prior context, reads the spec and produces a plan. Its questions reveal doc gaps. Fix them. Repeat. Do it again with a different model.\n\n### 5. Cold Code Run\n\nFresh session builds from the spec. Throw the code away. The value is in what the code reveals about the spec. Fix the spec. Run again with a different model. When models converge, the spec is ready.\n\n### 6. Implement For Real\n\nNow the code you write is from a spec that's been validated by cold reading, stress-tested by cross-model comparison, and refined through multiple rounds. Code generation becomes translation, not design.\n\n## Velocity\n\nThe methodology doesn't slow you down. It captures decisions at the speed of conversation and validates them with automated feedback loops. How fast you go depends on you and the complexity of the domain. A project in familiar territory can move fast. A project that requires learning new technology takes longer. The methodology is the same either way; the thinking is what takes the time, and Forge makes sure none of that thinking gets lost."
},
"backforge": {
"title": "BackForging an Existing Project",
"subtitle": "Turn your codebase into a spec that can rebuild it better",
"body": "You have working code. Maybe it was built quickly, maybe it evolved over time, maybe you inherited it. The architecture is implicit in the code but not documented anywhere. You want to stabilize it, or you want to rebuild it on a different stack, or you just want to understand what you actually have.\n\n## The Reframe\n\nBackForging isn't documentation. You're not writing a description of what was built. You're producing a spec for what should be built going forward.\n\nThe WHY blocks are the key difference. You're not asking \"why did someone build it this way?\" because chances are, even if it's your own project, you don't remember. Instead you're asking: **\"Why should it continue to be this way?\"** Present justification, not historical reconstruction.\n\nIf a decision can be justified now, it's a WHY block. If it can't be justified, it's an Open Question. Maybe it was right once and circumstances changed. Maybe it was never right. Either way, you found it.\n\n## The Process\n\n### 1. Let an Agent Read the Codebase\n\nGive your AI access to the code and ask it to produce the Forge artifacts: constitution, conventions, domain docs, glossary. Don't write it from memory. Let the agent extract it. It will find patterns you've forgotten and inconsistencies you didn't know existed.\n\n### 2. Review with Fresh Eyes\n\nThe agent's output is a first draft. Read it critically. Is this what the code does, or what it should do? Are there inconsistencies it surfaced? Are there decisions buried in the code that should be explicit? Challenge everything.\n\n### 3. Write the Present-Tense WHY Blocks\n\nThe agent extracted the \"what.\" You evaluate the \"should it stay this way?\" For each decision: is there a good reason for this to persist? If yes, write the WHY. If no, flag it as an Open Question.\n\nIf you find yourself writing \"WHY: unknown, this exists but nobody knows why\" for a business rule, that's not failure. That's a rule running in production with no justification. Now someone can decide if it stays or goes.\n\n### 4. Validate\n\nCold session reads the BackForged spec. Does its understanding match the actual system? Where it diverges, either the spec is wrong or the code has a problem you should know about.\n\n## The Bigger Play\n\nOnce the spec exists, the code becomes disposable. You could rebuild the project in a different language, with different libraries, on a different stack. The spec is the portable artifact. Update the conventions doc for the target stack, run the normal Forge validation pipeline, and rebuild.\n\nA PHP project backforged into a proper spec could be rebuilt in Node, Go, or Rust without losing the architectural decisions, the domain boundaries, or the business logic.\n\n## The Business Logic Audit\n\nThis is the sleeper benefit. Codebases carry business logic that nobody questions because it's buried in if-statements. The three-level approval chain. The Tuesday discount. The 15-minute timeout for free-tier users. BackForging externalizes all of it into readable Decided entries. Once visible, the people who should be evaluating those rules actually can. They can't read code. They can read a spec.\n\n**Maturity note:** The BackForging protocol is reasoned extrapolation from Forge principles. It hasn't been through the same grind of repeated practice that produced the rest of the methodology. It's sound in theory. Expect to iterate on it."
},
"miniforge": {
"title": "MiniForge",
"subtitle": "A five-minute health check for your vibe-coded project.",
"body": "You built something with AI. It works. You're not sure if it's good enough. You don't want to read an entire methodology to find out.\n\nMiniForge is a single prompt you drop into a session with your project. It reads your code, asks you five questions about who uses it and how sensitive it is, and produces a lightweight document that locks your choices, flags the obvious issues, and tells you honestly whether you need more structure.\n\n## What It Does\n\n1. **Reads your codebase** and identifies every library, framework, and tool you're using\n2. **Asks five questions** about your audience, sensitivity, reliability needs, team size, and production plans\n3. **Produces a single file** (`miniforge.md`) that goes in your project root\n\n## What You Get\n\n**Conventions (locked choices):** Everything you're already using, formatted so future AI sessions don't switch your stack. \"Use Stripe, do not switch to PayPal.\" Extracted automatically from your code.\n\n**Hard rules (3-5 things that must not change):** Based on your answers about who uses the project and what data it handles. \"User data stays on the server.\" \"All API calls require authentication.\"\n\n**Quick wins (fix these now):** A scan for common vibe-coding issues:\n- API keys or secrets in code (not in environment variables)\n- No error handling on API calls\n- No input validation on forms\n- SQL injection or XSS vulnerabilities\n- Hardcoded values that should be configurable\n- No rate limiting on public endpoints\n- Missing authentication on endpoints that need it\n- Dependencies with known security vulnerabilities\n\nFor each issue: exactly what to fix and where.\n\n**Risk assessment:**\n- **GREEN:** Your project is fine for its intended use. Keep the miniforge.md file for consistency.\n- **YELLOW:** Issues that will cause problems if the project grows. Fix the quick wins, consider adding more Forge practices.\n- **RED:** Issues that could cause data loss, security breaches, or reliability failures. Fix immediately.\n\n## How to Do It\n\nPaste the MiniForge prompt (available in the methodology docs) into a session with your codebase loaded. Answer the five questions. Get your miniforge.md file. The whole thing takes about five minutes.\n\nOr if you want the even shorter version:\n\n```\nLook at my codebase. Tell me what libraries I'm using (so I can lock them),\nwhat security issues you see (so I can fix them), and whether this project\nis okay for production use or needs more engineering work. Be honest.\n```\n\nThat's not MiniForge, it's just a health check. But it's better than nothing.\n\n## What Happens After\n\n**GREEN:** Keep your miniforge.md. You're done unless the project grows significantly.\n\n**YELLOW:** Fix the quick wins. Read the [Vibe to Forge](/guides/vibe-to-forge) guide for the next steps. You don't need the full methodology; you need the specific practices MiniForge recommended.\n\n**RED:** Fix the quick wins immediately. Then seriously consider whether this project needs a fuller Forge process. The miniforge.md document tells you exactly where the risks are.\n\n## Who This Is For\n\nAnyone who built something with AI and wants an honest answer about whether it's good enough. You don't need to be an engineer. You don't need to understand the Forge methodology. You don't need to commit to anything. Five minutes, one file, an honest assessment."
},
"vibe-to-forge": {
"title": "Vibe to Forge",
"subtitle": "Your project outgrew vibe coding. Here's the minimum viable upgrade.",
"body": "You built something with AI. You typed prompts, it wrote code, and now you have a working app. Maybe you're not an engineer. Maybe you are, but you moved fast and skipped the planning. Either way, it works. And now you're noticing problems.\n\n## Sound Familiar?\n\n- Every time you ask the AI to add something, it breaks something else\n- You paste the same context into every new conversation because it doesn't remember what you built last week\n- The AI keeps changing things you already decided on. You told it to use Stripe but it switched to PayPal\n- Different parts of your app do the same thing in different ways\n- You're afraid to touch certain parts because you don't know what they connect to\n- Someone else looked at your code and said \"why is it doing it this way?\" and you had no answer\n\nIf any of those hit home, your project has outgrown vibe coding. That's not a failure. It means you built something real enough to need structure.\n\n## You Don't Need to Be an Engineer (But It's Fine If You Are)\n\nPeople using AI to build things come from everywhere. Senior engineers with decades of experience. Product managers who started coding last month. Domain experts who know their field inside out but have never written a for-loop. Students. Founders. Hobbyists. Retirees building the thing they always wanted to build.\n\nForge works across that entire range. If you're a senior engineer, you'll recognize the patterns (ADRs, architectural decision tracking, dependency graphs) and apply them with your existing judgment. If you're new to building software, you don't need to know those terms. You just need to be able to explain what your app does and why you made certain choices. The AI handles the structure. You handle the decisions.\n\nThe conventions doc doesn't require you to know what \"contested idioms\" means. It requires you to know \"I'm using Stripe, not PayPal.\" The constitution doesn't require you to understand distributed systems theory. It requires you to know \"user data never leaves our server.\" The WHY blocks don't require architectural vocabulary. They require you to know \"I tried X, it didn't work, I switched to Y.\"\n\nYour domain knowledge is your contribution. The AI structures it.\n\nHere's what to do. You can tell your AI assistant all of this. In fact, the easiest approach is to paste the Forge methodology docs into your session and say \"help me apply this to my project, starting with a conventions doc.\" The AI knows what these things are. You just need to answer its questions.\n\n## Step 1: Tell Your AI What You Chose (and What Not to Use)\n\nThis is the single most impactful thing you can do. Create a document (call it `conventions.md` or whatever you want) that lists the choices you've already made. You don't need to know the technical terminology. Just tell the AI:\n\n\"I'm using Stripe for payments, not PayPal or Square. I'm using Supabase for the database. The frontend is React. Never switch these without asking me.\"\n\nThe AI turns that into a proper conventions doc with forbidden alternatives. Now every future session reads this first. No more random library switches. No more \"I upgraded you to a better database.\" Your choices are locked.\n\nIf you don't know what choices you've made, that's fine. Tell the AI to look at your codebase and list them. It'll find your dependencies, your patterns, your frameworks. You review the list and confirm: \"yes, keep all of these, don't change them.\"\n\n## Step 2: Write Down the Things That Can't Change\n\nIf your app has rules where violating them would break everything, write them down. You probably already know what they are even if you've never formalized them:\n\n- \"User data never leaves our server\"\n- \"All payments go through Stripe, nothing else\"\n- \"The app works offline, that's the whole point\"\n- \"We never store passwords in plain text\"\n\nThese are your constitutional articles. Three to five is plenty. Tell your AI about them and it creates the document. Now every session knows the hard boundaries.\n\n## Step 3: Explain Why You Made the Hard Decisions\n\nThink about the decisions that were hard to make or that keep coming up. Maybe you chose to store files locally instead of in the cloud. Maybe you decided on a specific way to handle user accounts. Maybe you tried something, it didn't work, and you switched to something else.\n\nWrite that down. Or better, tell your AI: \"I chose X because of Y, and I tried Z but it didn't work because of W.\" The AI creates WHY blocks. Now the next session (or the next person) doesn't undo your work because they don't know the history.\n\n## Step 4: Try a Cold Read\n\nThis is optional but eye-opening. Start a completely fresh AI conversation. Give it your conventions doc, your constitution, your WHY blocks, and your codebase. Ask it to produce a plan for adding the next feature. Watch what questions it asks.\n\nThose questions are gaps in your documentation. Every question it asks is something a future session would also get confused about. Fix the docs, and the next session doesn't ask.\n\n## What You're Not Doing\n\nYou're not writing a 300-page engineering specification. You're not learning graph databases. You're not running multi-model validation. You're capturing enough context that the AI stops breaking your stuff.\n\nThat's it. Start there. If the project grows and you need more structure, the rest of the methodology is waiting. But most vibe-coded projects just need these three things to stop feeling fragile.\n\n## The Bigger Picture\n\nVibe coding has its place. For personal tools, prototypes, small apps that stay small, it's great. But if your project has users, or handles money, or stores sensitive data, or is going to be maintained by someone other than you, the quality of the specification matters. Not because you need to be an engineer, but because the AI needs clear instructions to produce reliable output. Forge gives it those instructions."
},
"teams": {
"title": "Forge for Teams",
"subtitle": "The artifacts that seem like overhead for one person become coordination mechanisms for a group.",
"body": "The methodology reads as a solo workflow. It was developed by one person working with AI sessions. But it scales naturally to teams, and in some ways the artifacts pay off more with multiple people than solo.\n\n## Domain Ownership\n\nDomain docs partition along ownership lines. The person who owns payments owns the Order Service domain doc. The person who owns identity owns the User Service doc. Nobody reads 300 pages. Each person reads their 30 pages deeply and skims the rest for cascade impacts.\n\nThe cold session reads all 300 pages with equal attention. No human team member does that. The model compensates for the team's selective attention. The team compensates for the model's lack of judgment about what actually matters.\n\nIf you own the code, you own the doc. If the doc is stale, the next cold session will find it.\n\n## Decision Tracking as Coordination\n\nOn a solo project, the Decided/Open Questions tracking is a maturity signal. On a team, it's a coordination mechanism.\n\nAn Open Question in Domain 02 blocked on a decision in Domain 01 is a dependency between two people:\n\n```\nOQ-3: Authentication token format.\nBlocked on: Domain 01 auth decision (Sarah).\n```\n\nWhen Sarah resolves the auth decision, she updates Domain 01 and tells the Domain 02 owner. The status annotations are a lightweight coordination layer without a separate tracking tool.\n\nCascade tags work the same way. \"Cascade: Domains 03, 08, 11\" means three specific people need to review a change, not just three documents.\n\n## The Cold Session as the Honest Reviewer\n\nOn any team, there's social pressure around reviews. Junior members defer to senior ones. Friends give each other easy reviews. Nobody wants to block the sprint.\n\nThe cold session has no politics. It doesn't defer to seniority. It doesn't care about the sprint deadline. It reads every word and asks about every gap. When it says \"this spec is ambiguous about the retry strategy,\" it doesn't matter who wrote the spec.\n\n## Onboarding\n\nNew team members read the spec for their domain, read the constitution and conventions (short, applies everywhere), and start contributing. The spec contains enough context that a new person understands not just what the system does but why:\n\n- WHY blocks explain non-obvious decisions\n- TONIC entries prevent the new person from reaching for their preferred library\n- Terminology enforcement prevents vocabulary drift\n- Open Questions with status annotations tell them what's settled and what isn't\n\nWithout the spec, onboarding means reading code, asking questions, and slowly building mental models that may not match the original intent. With the spec, the full picture is available on day one.\n\n## Preventing Style Wars\n\nThe conventions doc is the team's truce document. Every contested idiom has a position with a WHY block. When a team member pushes back (\"but gin is more popular\"), the WHY block already contains the reasoning. They can disagree, but they can't claim it wasn't considered.\n\nTONIC entries aren't just for AI agents. They're for any team member whose instinct is to reach for the ecosystem default.\n\n## Integrating with Your Review Process\n\n- **Before architecture PRs:** Run a cold session against the updated spec. If it asks questions about the changed area, the spec isn't precise enough.\n- **Sprint boundaries:** Cold validation against the current spec. Route questions to domain owners. This replaces architecture review meetings, not adds to them.\n- **After onboarding:** New person runs their own cold validation. Their questions are the onboarding gaps.\n- **Quarterly:** Full cold validation + different model. How much has the spec drifted?\n\n## Scaling\n\n| Team Size | What Changes |\n|---|---|\n| 2-3 | Manual cascade checking works. One person runs cold sessions. Conventions doc prevents inconsistency. |\n| 4-8 | Domain ownership matters. Cascade tags become notifications. Cold sessions at sprint boundaries. |\n| 8-15 | Graph tooling pays off. Multiple people run cold sessions for their domains. |\n| 15+ | Graph tooling is essential. The spec is the coordination layer, not meetings. |\n\n## The Bottom Line for Team Leads\n\n- The spec is the coordination artifact. Decisions live in domain docs, not ticket comments.\n- Domain ownership means spec ownership.\n- The cold session is the most honest reviewer on your team.\n- Start with a conventions doc. Consistency improves immediately. Everything else follows.\n- The methodology doesn't add meetings. It replaces them with a process that's more thorough and less political.\n- New people ramp faster. The spec captures context that would otherwise take weeks of questions."
}
},
"templates": [
{
"id": "constitution",
"title": "Constitution",
"description": "Immutable architectural laws. Start with 3-5 articles for a small project, up to 10-15 for large systems.",
"filename": "constitution.md",
"content": "# Project Constitution\n\nImmutable architectural laws. No implementation decision can violate these.\n\n## Article 1: [Name Your Principle]\n\n[State the constraint clearly. What the system CANNOT do.]\n\n> Rationale: [Why this is load-bearing. What breaks if violated.]\n\n## Article 2: [Name Your Principle]\n\n[State the constraint clearly.]\n\n> Rationale: [Why this matters.]\n\n## Article 3: [Name Your Principle]\n\n[State the constraint clearly.]\n\n> Rationale: [Why this matters.]"
},
{
"id": "domain",
"title": "Domain Architecture Doc",
"description": "One per major concern. Includes Decided/Open tracking, WHY blocks, cascade tags, and inline ambiguity markers.",
"filename": "domain_nn_name.md",
"content": "# Domain NN: [Domain Name]\n\n*[One-line description of what this domain covers.]*\n\n---\n\n## Status\n\n| Category | Count |\n|----------|-------|\n| Decided | 0 |\n| Open | 0 |\n\n---\n\n## Architecture\n\n### Overview\n\n[What this domain does. Its responsibility boundary.]\n\n**Cascade: Domains [list affected domains]**\n\n### [Section Name]\n\n[Architectural specification. Use precise, mechanism-focused language.]\n\n**WHY [decision]:** [Rationale explaining the choice.]\n\n**WHY NOT [alternative]:** [Why the alternative was rejected.]\n\n### [Section Name]\n\n[More specification. Mark unresolved points with inline markers.]\n\nThis component uses [approach] `[OQ-1]` for [purpose].\n\n---\n\n## Decided\n\n1. **[Decision title]:** [Description]. **WHY:** [rationale]. **WHY NOT [alternative]:** [reason]. (Session NNN)\n\n2. **[Decision title]:** [Description]. (Session NNN)\n\n---\n\n## Open Questions\n\n1. **[Question title]:** *Discovery: Phase N*\n - [Context and sub-questions]\n\n2. **[Question title]:** *Post-V1*\n - [Why deferred and what triggers revisiting]"
},
{
"id": "conventions",
"title": "Conventions",
"description": "Implementation choices locked for consistency. Includes canonical dependencies, forbidden alternatives, and TONIC prevention.",
"filename": "conventions.md",
"content": "# Conventions\n\nImplementation choices locked for consistency across AI coding sessions.\n\n## Language\n\n- **Backend:** [language]\n- **Frontend:** [language/framework]\n- **Database:** [choice]\n\n## Canonical Dependencies (with Forbidden Alternatives)\n\n| Use | Package | Do NOT Use | WHY NOT |\n|-----|---------|-----------|---------|\n| [purpose] | [package] | [forbidden] | [reason] |\n| [purpose] | [package] | [forbidden] | [reason] |\n\n## Error Handling\n\n- [Rule 1]\n- [Rule 2]\n- [Rule 3]\n\n## Naming Conventions\n\n- [Pattern for files]\n- [Pattern for functions/methods]\n- [Pattern for types/classes]\n\n## Timestamp Convention\n\n- Internal: [representation]\n- Wire/storage: [format]\n- Field naming: `{event}_at` (e.g., `created_at`, `updated_at`)\n- Never bare `timestamp`. Always specify what happened.\n\n## Terminology Enforcement\n\n| Correct | Do NOT Use |\n|---------|-----------|\n| [project term] | [forbidden synonyms] |\n| [project term] | [forbidden synonyms] |"
},
{
"id": "edge-cases",
"title": "Edge Cases",
"description": "Cross-domain interaction analysis classified by response level. CODE-level cases are implementation exit criteria.",
"filename": "edge_cases.md",
"content": "# Edge Cases\n\n*Classified by response level. CODE-level cases are implementation exit criteria.*\n\n## Response Levels\n\n| Level | When | Runtime Cost |\n|-------|------|-------------|\n| **CODE** | Common (>1%), predictable, auto-recoverable | On the critical path |\n| **DEGRADE** | Uncommon (<1%), detectable, partially recoverable | Triggered by threshold |\n| **ALERT** | Rare, detectable, needs human intervention | Zero until triggered |\n| **ACCEPT** | Theoretical, impractical to prevent without disproportionate cost | Zero |\n\n---\n\n## EC-1.1: [Edge case title]\n**Cascade: Domain [NN]**\n**Response:** CODE. [Description of the scenario. What goes wrong. How to handle it.]\n\n## EC-1.2: [Edge case title]\n**Cascade: Domains [NN, NN]**\n**Response:** DEGRADE. [Description. Degraded behavior. When to alert.]\n\n---\n\n## Index\n\n| Domain | Edge Cases |\n|--------|-----------|\n| [NN] ([Name]) | EC-1.1, EC-1.2 |"
},
{
"id": "glossary",
"title": "Glossary",
"description": "Canonical definitions for project-specific terms. What each term means and, where ambiguous, what it does NOT mean.",
"filename": "glossary.md",
"content": "# Glossary\n\n*Canonical definitions for terms used across the architecture docs.*\n\n**[Term]:** [Definition in project context. Be specific about what this means HERE, not what it means generally.] Not the same as [common confusion].\n\n**[Term]:** [Definition. Include the mechanism, not just the outcome.]\n\n**[Term]:** [Definition. If the term is overloaded in the industry, clarify which meaning applies.]"
},
{
"id": "forge-rules",
"title": "Machine Reference (forge_rules.md)",
"description": "Compact reference designed to drop into your project for AI coding agents. Include in CLAUDE.md or equivalent.",
"filename": "forge_rules.md",
"content": "[This is the full forge_rules.md. Download it from the repository for the complete, current version.]\n\nThe machine reference is a compact version of the methodology designed for AI agent consumption. Drop it into your project root or include it in your CLAUDE.md / system prompt.\n\nIt covers:\n- Document hierarchy\n- Vocabulary precision\n- Decision tracking\n- WHY blocks\n- TONIC errors\n- Constitution\n- Conventions\n- Cascading consistency\n- Multi-model validation\n- Edge case analysis\n- Spec drift and reconciliation\n- Implementation discipline"
}
]
},
"methodology": {
"overview": "# Forge -- Specification-Driven Development\n\n*by Larry Diffey*\n\n## What Forge Is\n\nForge is a methodology for producing specifications precise enough that a cold AI session -- one with no prior context -- can act on them without guessing. The specification is the primary artifact. Everything else (code, policy documents, legal frameworks, whatever the end goal is) is generated from it.\n\n**The core doctrine:** Any specification gap that forces an AI agent to guess becomes a defect baked into the output. The agent fills ambiguity with its training data defaults, which may be wrong for this project.\n\nForge was not designed theoretically. Every technique exists because a specific failure mode occurred without it. It was developed through practice on large-scale systems projects (17-20 architecture domains, 300+ page specs) and refined through multiple rounds of multi-model validation.\n\n## What Forge Is Not\n\n**Not just for code.** The methodology applies to any spec-to-output pipeline. Software architecture, policy manuals, legal documents, compliance frameworks -- if the goal is a precise specification that a cold reader (human or AI) can execute without guessing, Forge applies.\n\n**Not slow.** The AI writes the docs from your conversations. Ideation and formalization happen at the speed of conversation. How long the full process takes depends on the project's complexity and the practitioner's familiarity with the domain. The rigor comes from the process catching what speed misses, not from going slow.\n\n**Not waterfall.** Forge is iterative. You ideate, formalize at natural breakpoints (\"Forge it\"), validate, fix, and repeat. The spec evolves through rapid cycles, not a long planning phase followed by implementation. Forge does not assume you can spec everything before building. Code teaches you things the spec can't anticipate. Open Questions mark what you know you don't know. FINDINGS.md captures what you discover during implementation. The spec is a living artifact that tightens through contact with reality, not a frozen document that must be complete before the first line of code.\n\n**Not a substitute for engineering.** Forge makes the spec precise. It does not make the spec complete or correct. If you forget to specify that your Rust/Go backend also needs a Python HTTP client or a Docker connector, Forge won't invent that for you. It catches ambiguity in what you specified. It can't catch what you never thought to specify.\n\nYou can write a perfectly Forged specification for the wrong product, the wrong market, or the wrong architecture. The methodology ensures the AI builds exactly what you specified. Whether what you specified is worth building, and whether you specified everything that matters, is your problem.\n\nThe cold code run will eventually surface the gap (\"wait, how does this connect to the external service?\"). But that's discovery through implementation, the normal engineering process. Forge captures the discovery in FINDINGS.md so it doesn't get lost. That's not a Forge failure. That's Forge working as designed.\n\nOver time, Forge does make you a better engineer. After you've watched an agent \"optimize\" away your crash isolation because you forgot a WHY block, you start writing WHY blocks proactively. After a cold session asks \"what happens when the database is unavailable?\" for the third project in a row, you start specifying error handling before being asked. The methodology trains the practitioner through repeated exposure to their own gaps. But it doesn't give you magic powers. It helps you be the best engineer you are.\n\n**Designed for AI coding sessions, not copy-paste.** Forge assumes you're working in an environment where the AI has direct access to your project files: Claude Code, Cursor, Copilot Workspace, Codex, or similar tools. The methodology relies on the AI reading your docs, modifying them, and maintaining consistency across hundreds of pages in context. Copying and pasting between a website chat and your editor would work technically but would quadruple the time and introduce copy errors. Use a tool that gives the AI direct file access.\n\n**Best with frontier models and long context.** Forge was developed and validated with large context frontier models (Claude Opus, GPT-4, Gemini Pro, Codex) that can hold an entire spec in context simultaneously. Smaller or local models may require adaptation: chunking docs into smaller passes, running validation domain-by-domain instead of corpus-wide, or accepting lower precision on semantic review. The core methodology doesn't change, but the workflow may need to accommodate context limits.\n\n## Not Sure If Forge Is For You?\n\nDrop `forge.json` (or the methodology docs) into your project. Ask your AI assistant: \"Read this methodology, then look at my project. What level of Forge should I use, and what are the biggest ambiguities in my current spec?\" You'll get a calibrated recommendation and useful observations in five minutes. If the feedback is useful, the methodology just demonstrated its own value without you committing to anything.\n\n---\n\n**Not about big words.** Vocabulary precision means choosing words that carry the most semantic information per token for the concept being constrained. \"Fault containment via process isolation\" isn't fancy language -- it's five words doing the work of two paragraphs. The skill is precision, not verbosity.\n\n## The Working Assumption\n\nForge operates on a premise: **code is downstream of spec.** The specification is the primary artifact. The code is a stochastic compilation of the spec into a particular stack at a particular time.\n\nAn LLM is a stochastic parrot. It produces the most probable next token based on its training data. It's not reasoning about your architecture. It's not trying to write good code. It's generating the most likely output given the input. That's fine. Forge doesn't try to make the parrot smarter. It constrains the input so that the most probable output is the correct output for your project. Vocabulary precision reshapes the probability distribution. TONIC tables override training data defaults. WHY blocks close the degrees of freedom where the parrot would guess. The methodology doesn't improve the model. It improves what the model has to work with.\n\nAn LLM generating code is a noisy compiler. Same spec, different runs, different code. But the right question isn't \"is the code identical?\" It's \"does the behavior converge?\" For the vast majority of software, the answer is yes, as long as the spec is tight enough. Two independent runs produce different variable names, different stdlib choices, different orderings. Both pass the same tests, satisfy the same contracts, produce the same outputs. The wiring diverges. The behavior converges. That's the only determinism that matters.\n\nEvery distinctive piece of Forge follows from this premise. Cold code runs work as validation because you throw the code away, and you can only afford to throw code away if code isn't the artifact you care about. Multi-model validation works because divergence signals spec ambiguity, not model error. WHY blocks prevent optimization drift in the wiring layer. TONIC tables preempt default-collapse in the wiring layer. All of these target the spec, not the code. If code were the primary artifact, you'd be reviewing code. You're reviewing the spec.\n\nThe cost model inverts too. Traditional development puts the cost in maintaining code forever. Forge puts the cost in tightening the spec once and regenerating code as needed. For anything maintained over time, the Forge model wins, and it wins more the longer the software lives.\n\nThis assumption holds for the vast majority of production software: services, APIs, pipelines, data flow, business logic, UI. For research software where the spec and implementation are co-developed, or performance-critical kernels where microarchitecture tuning matters, different tradeoffs apply, but even there, the spec is what survives the next hardware generation or the next rewrite. Code is the tax you pay to run the spec on a particular machine at a particular time.\n\n---\n\n## Three Pillars\n\n### 1. Precision\n\nEvery word in a specification constrains the model's output space. Vague words produce wide probability distributions. Precise words produce narrow ones. The savings compound across every session that reads the spec.\n\n| Vague | Precise |\n|-------|---------|\n| \"separate things so they don't affect each other\" | \"fault containment via process isolation\" |\n| \"it should be fast\" | \"hot-path latency budget: sub-millisecond\" |\n| \"the system handles different sources\" | \"source-agnostic pipeline\" |\n\nThis isn't a style preference. It's an engineering practice. The glossary extends this principle to terminology: every project-specific term gets a definition that narrows the probability distribution from \"whatever the training data says\" to \"exactly what this project means.\" The glossary is a TONIC-prevention mechanism for vocabulary.\n\n### 2. Validation\n\nSpecs are validated by cold reading -- fresh AI sessions that encounter the docs with no prior context. Different models find different gaps. Where models converge, the spec is clear. Where they diverge, the spec is ambiguous. The divergence is the signal.\n\nThe outputs of validation (plans, code) are disposable. The value is in what the output reveals about the spec. Every deviation from intent is a spec defect to be fixed.\n\n### 3. Feedback\n\nEvery validation round tightens the spec. Cold session questions feed back as doc fixes. Implementation guesses feed back as WHY blocks and TONIC entries. The spec improves monotonically -- it never gets worse, only more precise.\n\nThe feedback loop is: **agent guesses wrong -> FINDINGS.md captures the guess -> postmortem reviews -> wrong guess becomes a WHY block or TONIC entry -> next agent doesn't guess on that one.**\n\n## Division of Labor\n\n**The human does judgment.** Direction, decisions, steering, challenging choices, knowing what to build and why.\n\n**The model does production and verification.** Writing the docs (the AI writes the entire spec from conversation), detecting ambiguity at a granularity no human reviewer can match across 300 pages, maintaining internal consistency across the full corpus.\n\n**Automated tooling does mechanical checking.** Graph analysis, dictionary linting, exclusion index validation -- tedious, high-precision, zero-creativity work that should never be done manually.\n\n## The Pipeline (Summary)\n\nThe full pipeline is detailed in `forge_process.md`. At a high level:\n\n1. **Ideate** -- discuss, explore, capture verbatim\n2. **Formalize** -- \"Forge it\" at natural breakpoints; model produces the spec\n3. **Validate structurally** -- graph analysis, ambiguity lint (optional tooling)\n4. **Validate by cold reading** -- fresh sessions probe the spec, questions feed back as doc fixes\n5. **Validate by cold code run** -- fresh sessions build from the spec, deviations feed back as doc fixes\n6. **Signal** -- models converge, no more spec-driven deviations\n7. **Implement** -- code generation is translation, not design\n\nEach step is documented in its own protocol file in this directory.\n\n## The Spec is a Thinking Tool\n\nThe spec isn't just a communication artifact for agents to read. It's how you think through your own project. The model produces spec content that sometimes exceeds your domain knowledge in a specific area. Reading it forces you to evaluate whether it's right, which forces you to think through implications you wouldn't have reached alone.\n\nYou don't have to understand every technical detail the spec contains. Your read-through might be \"does this feel right and is it consistent with my intent\" rather than \"I can verify every claim.\" That's fine. The cold validation session stress-tests the parts you couldn't fully evaluate yourself. Between your judgment and the cold session's literal reading, coverage is better than either alone.\n\nThe upfront investment isn't just about preventing bugs later. It's about making better decisions now, because the process of formalizing forces decisions that would otherwise be deferred until they're expensive to change.\n\n## Where to Start\n\n| If you want to... | Read... |\n|-------------------|---------|\n| Understand the full process | `forge_process.md` |\n| Drop Forge into a project immediately | `forge_rules.md` (machine reference) |\n| Know what artifacts to produce | `artifacts.md` |\n| Run a cold validation session | `cold_validation_protocol.md` |\n| Run a cold code validation | `cold_code_run_protocol.md` |\n| Run a semantic ambiguity review | `semantic_review.md` |\n| BackForge an existing project | `backforge_protocol.md` |\n| Use Forge with a team | `teams.md` |\n| Design project-specific graph signals | `custom_graph_signals.md` |\n| Connect graphs across multiple repos | `multi_repo.md` |\n| Scale Forge across an organization | `forge_at_scale.md` |\n| Train your team on precise spec writing | `training.md` |\n| Adoption order, maturity tiers, first 48 hours | `adoption.md` |\n| Day-to-day practice on a Forged project | `operational_guide.md` |\n| See example artifacts | `../examples/` |\n| Set up graph tooling | `../tooling/graph/` |\n",
"process": "# The Forge Process\n\nThe complete pipeline as practiced. Each phase feeds the next. The ordering is prescribed; the frequency is a judgment call calibrated to the project's stakes, team size, and domain count.\n\n---\n\n## Phase 1: Ideation\n\nDiscuss. Explore. Capture verbatim.\n\nThe project starts as conversations -- with AI models, with collaborators, with yourself. These conversations are the raw material. Save them verbatim, not summarized, because you will lose the nuance of your thinking. It doesn't matter how you capture them (markdown files, a conversation capture tool, voice transcripts), but the full exchange must be preserved.\n\nIdeation is not linear. You will revisit topics, change your mind, discover that an earlier decision was wrong. That's the process. The conversations capture the evolution of thinking, including dead ends and abandoned approaches. These have value -- they prevent re-exploring paths that were already tried and rejected.\n\nAt natural breakpoints -- when a topic has reached a conclusion, when a feature is well enough understood to specify, when you're moving on to the next thing -- tell the model: **\"Forge it.\"** That two-word command triggers formalization of the current ideation state into the specification. You don't need to explain how. The model has the methodology loaded and knows the target structure.\n\nIdeation and formalization interleave. You don't finish all ideation and then formalize. You ideate on Topic A, Forge it, ideate on Topic B, Forge it, circle back to Topic A with new understanding, Forge the updates. The spec evolves at the speed of conversation.\n\n**Key practice:** Start the conversation that produces the spec with the Forge methodology docs loaded, so the model knows the rules it's writing to.\n\n---\n\n## Phase 2: Formalization\n\n**The AI writes the docs, not you.** The model produces the formal specification from your conversations. You do not sit down and write a constitution, a conventions doc, or domain docs by hand. You have design conversations. You make decisions. You say \"Forge it.\" The AI produces the artifacts. You review, challenge, and steer. The human time is the thinking, not the typing. A constitution doesn't take four weeks to write. It takes the AI minutes to extract from conversations that already happened. The human reviews it in an hour.\n\nThis is interactive, not one-shot. The model writes, the human reads every word, challenges what's wrong, and has real-time conversations to clarify. \"In this section we're talking about X but won't that cause Y?\" might change the architecture. The model pushes back too. It tells you when you're wrong. You tell it when it's wrong. The quality comes from that exchange.\n\n### What Gets Produced\n\nSee `artifacts.md` for the full list. The core artifacts are:\n\n- **Domain docs** -- one per architectural domain, with Decided/Open Questions/WHY blocks\n- **Conventions** -- implementation choices locked for consistency\n- **Glossary** -- canonical terms with what they mean and what they don't mean\n- **Constitution** -- immutable architectural laws (~10-15 articles)\n- **Engineering plan** -- phased build sequence with exit criteria\n\nNote the order. Domain docs and conventions emerge first from the design discussions. The glossary crystallizes as terms settle. The constitution is derived last, when you notice certain principles are load-bearing and multiple decisions depend on them. You don't define your terms then ideate, and you don't write a constitution then design. You ideate, and the artifacts trail the thinking.\n\n### Decision Tracking\n\nEvery architectural decision is logged as a **Decided** entry (numbered, with rationale) or an **Open Question** (numbered, with status annotation). The Decided/Open ratio is a maturity signal. A domain with 45 decided and 2 open is ready. A domain with 15 decided and 22 open is not.\n\nOpen questions are not defects. They are explicitly marked unknowns. Some decisions genuinely can't be made until implementation (\"which library is faster for our workload\" is a benchmark question, not a spec question). Forcing a decision there produces a guess dressed up as a Decided entry, which is worse than an honest OQ.\n\n**Forge does not assume you can spec everything before building.** Code teaches you things the spec can't anticipate. The methodology accounts for this at every level: Open Questions mark what you know you don't know. FINDINGS.md captures what you discover during implementation. Spec drift reconciliation catches what changed after shipping. Cold code runs surface gaps the spec missed. The spec is not a waterfall document that must be complete before code begins. It's a living artifact that tightens through contact with reality. The difference from no-spec development is that discoveries get captured and fed back rather than lost in git history.\n\n### WHY Blocks\n\nEvery decision where a reasonable agent might choose differently gets a WHY block explaining the rationale and why alternatives were rejected. Decisions where no realistic alternative exists don't need WHY blocks. 100% WHY coverage is the wrong goal.\n\n### TONIC Entries\n\nWhen the project makes a non-default choice (a specific library, a specific pattern, a specific approach), the conventions doc must state what to use AND what NOT to use, with why. Otherwise, an agent trained on the full ecosystem will gravitate toward the default. This is the Technically Obvious, Not Intended Choice class of error.\n\n### Inline Ambiguity Markers\n\n`[OQ-N]` markers in the spec body link to Open Questions. The marker interrupts assumption formation at the exact point where ambiguity exists, so an agent doesn't read a paragraph that sounds definitive but actually contains an unresolved assumption.\n\n---\n\n## Phase 3: Structural Validation\n\n*This phase uses optional tooling. The methodology prescribes the what (validate structurally before cold sessions), not the how (graph, script, prompt, printed paper). See `../tooling/` for reference implementations.*\n\nBefore spending tokens on cold validation sessions, catch everything a parser can catch mechanically. Two sub-steps:\n\n### 3a: Graph Analysis (if using graph tooling)\n\nLoad the architecture docs into a dependency graph. Run validation:\n- Decision count verification (claimed vs parsed)\n- WHY block coverage\n- Orphaned domains (no references to or from)\n- Phase conflicts (same capability assigned to different phases in different docs)\n- Cascade tag coverage (cross-domain references without cascade tags)\n- Command coverage (commands mentioned but not assigned to phases)\n- Constraint coverage (prohibitions without TONIC/Decided entries)\n- Error condition coverage (errors without documented handling)\n- Article reference checks (constitutional citations that don't exist)\n- Phase dependency chains (early-phase items depending on late-phase items)\n\nFix everything the graph surfaces. Reload and re-validate until clean.\n\n### 3b: Semantic Review (Dictionary Lint)\n\nScan all binding contexts (Decided sections, conventions, spec body -- not Open Questions, not WHY rationale text, not code blocks) against the ambiguous language dictionary. Flag occurrences of probabilistically wide language: \"should,\" \"appropriate,\" \"as needed,\" \"handle,\" \"etc.\" and the hundreds of other terms cataloged in `../reference/ambiguous_language_dictionary.md`.\n\nThis is flag-and-report, not auto-reject. Context determines whether usage is genuinely ambiguous. \"Should\" is precisely defined in RFC 2119 but vague everywhere else. \"Approximately\" is fine when paired with tolerance (\"approximately 100ms +/-10ms\").\n\nFix what's genuinely ambiguous. Document approved false positives in the exclusion index if using one.\n\nSee `semantic_review.md` for the full protocol.\n\n---\n\n## Phase 4: Cold Validation\n\nThe core validation mechanism. Fully specified in `cold_validation_protocol.md`.\n\n### What Happens\n\n1. A fresh AI session (no prior conversation history) receives the Forge methodology docs and the project spec.\n2. A specific prompt directs it to read everything, produce a coding plan (or whatever the project's output type is), and ask questions inline rather than guessing.\n3. The cold session produces a **disposable plan** and surfaces questions it can't resolve.\n4. The questions go to the **hot session** (the one with all the conversation history and ideation context).\n5. The hot session answers the questions AND updates the docs to cure the ambiguity.\n6. The answers are pasted back to the cold session, which continues.\n7. Repeat until the cold session stops asking questions and just produces output.\n8. Run the entire process again with a **different model** (Claude then Codex, or vice versa). Different models find different gaps.\n\n### The Disposable Plan\n\nThe plan is not the goal. It's a forcing function. Producing a plan forces the model to confront every ambiguity in the spec because it has to make concrete decisions. The questions are the actual output. The plan gets evaluated for adherence to the architecture docs and then thrown away.\n\n### Key Rules\n\n**A cold question is proof of a doc defect.** For coding projects aiming for determinism, this is absolute. If the cold session asked, the docs failed to communicate clearly enough. The hot session's job is not just to answer the question but to fix the doc so the question wouldn't be asked again.\n\n**Hot session resistance.** The hot session will frequently claim a cold question \"doesn't need a doc fix.\" It's wrong. The hot session has full context and can't see the ambiguity because it has information the docs don't carry. If the cold session asked, the docs are ambiguous. Every time.\n\n**For non-coding projects,** there's a diminishing returns threshold where remaining ambiguity has no material impact on the output. But declaring \"good enough\" prematurely is the most common failure mode, especially for someone working alone without accountability.\n\n---\n\n## Phase 5: Plan Adherence Check\n\nEvaluate the disposable plan from Phase 4 against the architecture docs:\n- Does it follow the document hierarchy (constitution > conventions > domain docs > engineering plan)?\n- Does it respect scope boundaries (V1 vs post-V1)?\n- Does it use the correct libraries and patterns from conventions?\n- Does it honor the constitutional articles?\n- Where it deviates, is the deviation a spec gap or a model error?\n\nSpec gaps feed back as doc fixes. The plan is thrown away.\n\n---\n\n## Phase 6: Cold Code Run\n\nFully specified in `cold_code_run_protocol.md`.\n\nA fresh session receives the Forge methodology docs, the architecture spec, and the coding plan, and builds. The code is disposable -- it's a validation probe, not a deliverable.\n\n### Evaluation\n\n- Did it follow the conventions (right libraries, right patterns)?\n- Did it respect TONIC entries (avoided forbidden alternatives)?\n- Did it honor WHY blocks (didn't \"optimize\" away constraints)?\n- Did it guess at implementation details the spec should have specified?\n- Did it adhere to the constitution?\n- Did it use something it was directed not to (an npm/pip package excluded in a WHY block)?\n\nEvery deviation is either a spec defect or a FINDINGS.md entry. The code gets thrown away. The spec improvements persist.\n\n### After Fixes\n\nDepending on how much the docs changed:\n- Minor fixes: proceed to alternate model code run\n- Significant restructuring: re-run graph analysis (Phase 3a), dictionary lint (Phase 3b), possibly another cold validation (Phase 4)\n\nThen run the cold code run again with the alternate model.\n\n### Platform Caveats\n\nIf the build environment can't compile for all targets (building on Linux, targeting Windows), those steps get skipped and noted. Hardware-dependent tests get marked with `[SKIP if <condition>: <alternative verification>]`. These are project-specific blockers, not methodology failures.\n\n---\n\n## Phase 7: Signal\n\nThe process is converging when:\n- Cold sessions stop asking architectural questions and only surface edge cases or detail questions\n- Cross-model implementations converge on the same architecture\n- Divergence points to genuinely open decisions, not spec gaps\n- Code artifacts generate without drift from the spec\n- Cascading consistency is verified\n- TONIC errors are eliminated\n- Graph validation passes clean (if applicable)\n- Semantic grooming is current (if applicable)\n- Edge cases cataloged and classified\n\nThis is the diminishing returns signal. Another pass might catch something, but the cost-benefit has shifted.\n\n---\n\n## Phase 8: Implement\n\nDocumentation is ready. Code generation is a translation task, not a design task.\n\n### During Implementation\n\n- Follow the phased build sequence from the engineering plan\n- Complete Phase N (meet all exit criteria) before starting Phase N+1\n- Document every spec ambiguity, contradiction, and guess in **FINDINGS.md**\n- FINDINGS.md is the primary feedback mechanism -- the postmortem reviews it and feeds fixes back into the spec\n\n### Phase Transition Protocol\n\nAfter completing each phase:\n1. Run all tests\n2. Write phase report\n3. Append ambiguities to FINDINGS.md\n4. Begin next phase\n\n### Spec Drift Reconciliation\n\nOnce code is in production, changes come from both directions: spec-driven (features, refactors) and code-driven (bug fixes, performance patches). Code-driven changes don't go through the Forge pipeline. The spec drifts. This is normal.\n\nPeriodically batch-assess accumulated code-driven changes against the spec. An AI agent reads the spec and the changes since last reconciliation, identifies where the code now contradicts the spec, and produces a list of spec updates needed. This is a session, not a project -- the spec was written for machines to read, so machines can check it.\n\n---\n\n## Flexibility\n\nThe full pipeline is not for every project. Most projects don't need most of it. The amount of Forge you apply depends on two axes: **size** (how many domains, how much cross-domain coupling) and **stakes** (what happens if the architecture is wrong).\n\n| | Low Stakes | High Stakes |\n|---|---|---|\n| **Small** | Conventions + WHY blocks on the tricky decisions. That's it. | Full cold validation, constitution, TONIC table, decision tracking with audit trail. |\n| **Large** | Full pipeline but lighter validation cycles. Cold validate the critical domains. | Everything. Graph, multi-model, dictionary lint, cold code runs, edge case classification. |\n\nA 5,000-line medical device controller needs more rigor than a 500,000-line social media app. Size tells you how much structure you need to manage complexity. Stakes tell you how much validation you need to ensure correctness.\n\nThe ordering is prescribed: structural validation before cold validation before cold code runs. But:\n\n- **How often** you run each phase is a judgment call\n- **Which tooling** you use is up to you (or none at all)\n- **The graph** is optional tooling, not required methodology\n- **The dictionary lint** can be a script, a prompt, an agent, or manual review\n\nThe core loop -- formalize, validate cold, fix, repeat -- is the methodology. Everything else accelerates it.\n\n**The spec bureaucracy trap:** A Forge corpus that nobody maintains is worse than no spec at all. It creates false confidence: the team believes the spec is authoritative while the code has drifted past it. If you adopt the full pipeline, you're committing to maintaining it. If your team won't maintain it, adopt less. A lightweight conventions doc that stays current is worth more than a 300-page spec that's six months stale. The test after one week: are your AI sessions producing less drift, fewer silent assumption changes, and fewer repeated arguments? If yes, keep expanding. If no, stop before you sink more effort.\n\n## Team Use\n\nThe methodology was developed solo but scales naturally to teams. Domain docs partition along ownership lines. Decision tracking becomes coordination. Cascade tags become notification lists. The cold session becomes the honest reviewer that doesn't play politics.\n\nSee `teams.md` for the full team guide including domain ownership, onboarding, review process integration, and scaling by team size.\n\n**If you're not sure how much to adopt:** Drop the Forge methodology docs into your project and ask your AI assistant \"what level of Forge should I use for this project?\" It has enough context to give you a reasonable recommendation. You don't have to figure it out on your own.\n\n---\n\n## Order of Operations (Quick Reference)\n\n```\n1. Ideate (conversations, captured verbatim)\n2. Forge it (formalize at natural breakpoints)\n3. Graph + lint (structural validation, optional tooling)\n4. Cold validation (fresh session, disposable plan, questions)\n5. Hot/cold feedback loop (questions -> answers -> doc fixes -> continue)\n6. Plan adherence check (evaluate plan against docs, throw plan away)\n7. Repeat 3-6 with alt model (different blind spots surface different gaps)\n8. Cold code run (fresh session builds, code is disposable)\n9. Evaluate code adherence (what did it get wrong, what did docs miss)\n10. Fix docs, re-run as needed (graph, lint, cold validation if significant)\n11. Cold code run, alt model (repeat with different model)\n12. Signal (convergence across models)\n13. Implement (translation, not design)\n```\n",
"rules": "# Forge Rules (Machine Reference)\n\n*Compact reference for AI coding agents. Every rule includes the problem it solves. For full methodology, see `forge_process.md`. For the narrative origin story, see `../docs/forge_coding.md`.*\n\n---\n\n## What This Is\n\nForge is specification-driven development for AI-assisted projects. Documentation is the primary artifact. The spec must be precise enough that a cold AI session (no prior context) produces a correct implementation without guessing.\n\n**Core doctrine:** Any specification gap that forces an AI agent to guess becomes a defect baked into the output.\n\n**Generality:** Forge applies to any spec-to-output pipeline -- software, policy, legal, compliance. The output type doesn't change the methodology.\n\n---\n\n## Document Hierarchy\n\nWhen documents conflict, follow this priority:\n\n```\nConstitution (immutable architectural laws)\n > Conventions (implementation rules)\n > Architecture Domain Docs (domain specifications)\n > Engineering Plan (build sequence and phases)\n```\n\nHigher-priority documents override lower.\n\n---\n\n## Vocabulary Precision\n\n**Problem:** Vague language produces wide probability distributions in LLM output. The agent fills ambiguity with training data defaults.\n\n**Fix:** Use mechanism-focused terms that name the property and the mechanism, not just the outcome.\n\n| Vague (wide distribution) | Precise (narrow distribution) |\n|--------------------------|------------------------------|\n| \"separate things so they don't affect each other\" | \"fault containment via process isolation\" |\n| \"it should be fast\" | \"hot-path latency budget: sub-millisecond\" |\n| \"keep it safe\" | \"capability-based security at the runtime boundary\" |\n| \"the system handles different sources\" | \"source-agnostic pipeline\" |\n\n**Reference:** See `../reference/ambiguous_language_dictionary.md` for a comprehensive catalog of probabilistically wide language patterns to avoid in binding contexts.\n\n---\n\n## Decision Tracking\n\nEvery architecture domain doc maintains two sections:\n\n**Decided** -- Numbered entries. Each has: title, description, rationale (WHY block where alternatives exist), session reference.\n\n**Open Questions** -- Numbered entries with status annotations:\n- `Discovery: Phase N` -- will be answered during implementation\n- `Post-V1` -- explicitly deferred\n- `RESOLVED: Decision N` -- answered, cross-referenced to the decision\n- `Blocked on: [dependency]` -- waiting for another decision\n\nOpen questions are not defects. Unmarked ambiguity is.\n\n---\n\n## Inline Ambiguity Markers\n\n**Problem:** An agent reads a paragraph that sounds definitive but contains an unresolved assumption.\n\n**Fix:** `[OQ-N]` markers in the spec body link to Open Questions. The marker interrupts assumption formation at the exact point of ambiguity.\n\n---\n\n## WHY Blocks\n\n**Problem:** Agents optimize. Without rationale, they \"improve\" the design by removing constraints they don't understand.\n\n**Fix:** WHY blocks explain why a decision was made and why alternatives were rejected.\n\n**Coverage rule:** Required where a reasonable agent might choose differently. Self-evident decisions don't need WHY blocks.\n\n**Feedback loop:** Agent guesses wrong -> FINDINGS.md captures the guess -> postmortem reviews -> wrong guess becomes a WHY block or TONIC entry -> next agent doesn't guess on that one.\n\n---\n\n## TONIC Errors\n\n**Problem:** Technically Obvious, Not Intended Choice. The agent picks the ecosystem default instead of the project's intentional non-default choice.\n\n**Fix:** State what to use AND what NOT to use, with why. The conventions doc carries a TONIC prevention table:\n\n| Use | Do NOT Use | Why |\n|-----|-----------|-----|\n| `prost` (raw protobuf) | `tonic` (gRPC framework) | No gRPC on this side; UDS framing only |\n| `chi` router | `gin` | chi is stdlib-compatible; gin uses custom context |\n| Direct API calls | LangChain | Scoped role doesn't need orchestration framework |\n| CatBoost | XGBoost | Native categorical feature handling without encoding |\n\nEach entry predicts an exact mistake and pre-corrects it. The TONIC table grows organically: you start with the choices you know will trip agents, cold validation runs surface more, cold code runs surface even more. In practice it ends up with 15-25 entries.\n\nThe pattern: whenever (1) a reasonable ecosystem default exists, (2) the project made a specific non-default choice, and (3) the docs don't explicitly exclude the default, an agent will gravitate toward the default. Explicitly forbid it.\n\n---\n\n## The Project Constitution\n\nShort document (~10-15 articles) of immutable architectural laws. Different from conventions (which tell HOW to write code) and WHY blocks (which explain individual decisions). The constitution tells the agent what the system CANNOT do.\n\n**Test:** If violating this principle would require redesigning multiple subsystems, it's constitutional.\n\nThe constitution is derived, not constructed upfront. It crystallizes when you notice certain principles are load-bearing across multiple domains. You discover your immutable laws; you don't invent them on day one.\n\nWhen an immutable law has a legitimate exception, document it inline with the article. Don't bury exceptions in separate files.\n\n---\n\n## The Conventions Document\n\nLocks implementation choices to prevent inconsistency across models and sessions:\n\n- **Canonical dependencies** with explicit prohibition of alternatives\n- **Language-specific conventions** (naming, error handling, module structure)\n- **Contested language idioms** with WHY blocks for the position taken\n- **Cross-language conventions** (IPC format, timestamps, error taxonomy)\n- **Terminology enforcement** (correct terms mapped to forbidden synonyms)\n\n---\n\n## Artifact Ordering\n\nArtifacts trail the thinking. Domain docs emerge first from design discussions. Conventions solidify as patterns emerge. The glossary locks down as terms settle. The constitution crystallizes last, when load-bearing principles become apparent. You don't define your terms then ideate, and you don't write a constitution then design. You ideate, and the artifacts follow.\n\n---\n\n## Cascading Consistency\n\n**Problem:** A decision in one domain has implications across others. One stale cross-reference can cause a module to be built against wrong assumptions.\n\n**Fix:** After every change, propagate across all cross-references. Three levels:\n\n- **Level 1: Manual** (5 domains) -- identify and update affected domains\n- **Level 2: Dependency Matrix** (10 domains) -- inline cascade tags, structured tracking\n- **Level 3: Graph** (15+ domains) -- Neo4j-backed deterministic dependency tracking (see `../tooling/graph/`)\n\n---\n\n## The Validation Pipeline\n\n```\nPHASE 1: IDEATE Discuss, explore, capture verbatim. User says \"Forge it\" to\n trigger formalization of the current ideation state.\nPHASE 2: FORMALIZE Domain docs, conventions, glossary, constitution, engineering plan.\nPHASE 3: STRUCTURAL Graph analysis + dictionary lint. Fix mechanically detectable issues.\n (Optional tooling -- see ../tooling/ and ../reference/)\nPHASE 4: COLD VALIDATE Fresh session reads docs. Produces disposable plan. Surfaces questions.\n Hot session answers + fixes docs. Repeat until questions dry up.\n Run again with alternate model.\n (See cold_validation_protocol.md)\nPHASE 5: PLAN CHECK Evaluate disposable plan against architecture docs. Throw plan away.\nPHASE 6: COLD CODE Fresh session builds from spec. Code is disposable validation probe.\n Evaluate adherence. Fix docs. Run with alternate model.\n (See cold_code_run_protocol.md)\nPHASE 7: SIGNAL Convergence across models. No more spec-driven deviations.\nPHASE 8: IMPLEMENT Code generation is translation, not design. FINDINGS.md captures\n discoveries. Spec drift reconciliation over time.\n```\n\n---\n\n## Cold Validation Rules\n\n**A cold question is proof of a doc defect.** If the cold session asked, the docs failed to communicate. The hot session must answer AND fix the doc.\n\n**Hot session resistance.** The hot session will claim questions \"don't need a doc fix.\" It's wrong. The hot session has context the docs don't carry. If the cold session asked, the docs are ambiguous.\n\n**The disposable plan is a forcing function.** Producing a plan forces the model to confront every ambiguity. The questions are the actual output. The plan is a probe.\n\n**Open questions don't affect review quality.** A tracked OQ with a status is the opposite of ambiguity -- it's the spec being honest about what it doesn't know yet. Some decisions can't be made until implementation.\n\nSee `cold_validation_protocol.md` for the full protocol.\n\n---\n\n## Semantic Review\n\n**Problem:** Structural validation catches relationship problems. It doesn't catch linguistic ambiguity -- \"should\" in a Decided entry, \"handle gracefully\" without defining what graceful means.\n\n**Fix:** A semantic review pass scans binding contexts for probabilistically wide language. Flag and report, don't auto-fix -- context determines whether usage is genuinely ambiguous.\n\nRun before cold validation. Different models flag different patterns.\n\nSee `semantic_review.md` for the protocol. See `../reference/ambiguous_language_dictionary.md` for the detection vocabulary.\n\n---\n\n## Implementation Discipline\n\n### Phase Sequencing\nComplete Phase N before starting Phase N+1.\n\n### Exit Criteria\nEvery phase has verifiable exit criteria: specific files, passing tests, integration points, scope boundaries.\n\n### FINDINGS.md\nEvery spec ambiguity, contradiction, and guess documented during implementation. The postmortem reviews FINDINGS.md and feeds fixes back into the spec.\n\n### Spec Drift Reconciliation\nWhen code-driven changes accumulate to a threshold (a release, a milestone, or when the spec and code no longer agree), batch-assess against the spec. The spec was written for machines to read; machines can check it.\n\n---\n\n## Edge Case Analysis\n\nAfter the architecture is substantively complete and cross-domain dependencies are mapped:\n\n1. Query interaction surfaces between domains\n2. For each surface: what goes wrong when timing, resources, state, or data are unexpected?\n3. Classify each edge case:\n\n| Level | When | Runtime Cost |\n|-------|------|-------------|\n| **CODE** | Common (>1%), predictable, auto-recoverable | On the critical path |\n| **DEGRADE** | Uncommon (<1%), detectable, partially recoverable | Triggered by threshold |\n| **ALERT** | Rare, detectable, needs human intervention | Zero until triggered |\n| **ACCEPT** | Theoretical, impractical to prevent | Zero |\n\nCODE-level cases become exit criteria. Don't code around ACCEPT-level cases that add measurable hot-path latency for scenarios occurring less than once per million operations.\n\n---\n\n## Anti-Patterns\n\n### Methodology-Level\n- **Silent guessing:** Document every assumption in FINDINGS.md\n- **Building all phases at once:** Complete Phase N before starting Phase N+1\n- **Skipping cascade updates:** One stale cross-reference can misalign an entire module\n- **Declaring \"good enough\" prematurely:** The most common failure mode, especially solo\n\n### Common Project-Level (put these in your conventions doc)\n- **Blanket lint suppression:** Override specific lints with reasons, never suppress all\n- **Hardcoded strings:** Use localization keys\n- **Delta metrics:** Use absolute values\n- **Bare timestamps:** Name the event: `created_at`, not `timestamp`\n\n---\n\n## Readiness Criteria\n\nDocumentation is ready when:\n\n1. Cold sessions stop asking basic questions and start asking edge cases\n2. Cross-model implementations converge on the same architecture\n3. Divergence points to genuinely open decisions, not spec gaps\n4. Code artifacts generate without drift from the spec\n5. Cascading consistency is verified\n6. TONIC errors are eliminated\n7. Graph validation passes clean (if applicable)\n8. Semantic grooming is current (if applicable)\n9. Edge cases cataloged and classified\n",
"artifacts": "# Forge Artifacts\n\nWhat a Forged project produces and why each artifact exists.\n\n---\n\n## Core Artifacts\n\nThese emerge during Phase 2 (Formalization) and get refined throughout validation. Important: these artifacts trail the thinking. You don't define terms then ideate, and you don't write a constitution then design. You ideate, and the artifacts crystallize from the decisions you've already made. Domain docs come first. Conventions solidify as patterns emerge. The glossary locks down as terms settle. The constitution is last, extracted when you notice certain principles are load-bearing across multiple domains.\n\n### Constitution\n\n**What:** A short document (~10-15 articles) of immutable architectural laws.\n\n**Why:** Conventions change. Libraries get swapped. Patterns evolve. But certain principles are load-bearing -- violating them undermines structural integrity across the entire system. The constitution separates these from conventions so an agent knows the difference between \"preferred approach\" and \"hard boundary.\"\n\n**When it emerges:** The constitution is the last core artifact to crystallize. You don't write a constitution then design around it. You ideate, discover which principles are load-bearing across multiple domains, and extract them here. It's a trailing signal, not a starting point.\n\n**Test for inclusion:** If violating this principle would require redesigning multiple subsystems, it's constitutional. If it would just produce a local bug, it belongs in conventions or as a WHY block.\n\n**Inline exceptions:** When an immutable law has a legitimate exception (a specific deployment profile that's stateless by design, a CLI mode that's ephemeral), document the exception inline with the article and its rationale. Don't bury it in a separate file. An agent checking compliance sees the exception immediately.\n\n**Example:** `../examples/constitution.md`\n\n---\n\n### Conventions\n\n**What:** Implementation choices locked for consistency across AI coding sessions. Covers canonical dependencies (with forbidden alternatives), language-specific patterns, contested idioms, cross-language conventions, terminology enforcement, error handling, and linting.\n\n**Why:** Different models default to different ecosystem choices. Without explicit conventions, Claude reaches for one library and ChatGPT reaches for another. The result is syntactically incompatible code across sessions.\n\n**Key feature:** The TONIC entry pattern. Every non-default choice states what to use AND what NOT to use, with why. Without the \"not,\" an agent trained on the full ecosystem will gravitate toward the default.\n\n**Example:** `../examples/conventions.md`\n\n---\n\n### Domain Documents\n\n**What:** One document per architectural domain. Each contains the specification for that domain, plus two formal sections: **Decided** (numbered finalized decisions) and **Open Questions** (numbered unresolved questions with status annotations).\n\n**Why:** Separating concerns into domains keeps each document focused and manageable. The Decided/Open tracking makes the state of every decision explicit -- an agent never has to infer whether something is settled.\n\n**Key features:**\n- WHY blocks on decisions where alternatives exist\n- `[OQ-N]` inline markers in the spec body pointing to Open Questions\n- Cascade tags marking cross-domain impact points\n- Session references tracing decisions back to the conversations that produced them\n\n**Example:** `../examples/domain_01_user_service.md`, `../examples/domain_02_order_service.md`\n\n---\n\n### Glossary\n\n**What:** Canonical definitions for project-specific terms, including what each term does NOT mean.\n\n**Why:** The glossary narrows the probability distribution for every term an agent encounters. Without it, \"service\" could mean a microservice, a system service, a background process, or a class. With the glossary entry, the agent's interpretation space collapses to exactly one meaning. The \"does NOT mean\" column does the same work as a TONIC entry: explicitly excluding the default interpretation. This works both for redefining common words (using \"Bob\" to mean a deployment node) and for defining terms that don't exist outside your project.\n\n**When it emerges:** The glossary is not written upfront. Terms get coined during ideation, used loosely at first, refined through conversation, and settled definitions get extracted here. The glossary trails the conversations.\n\n**Key feature:** The terminology enforcement table maps correct terms to forbidden synonyms. Without this, one model calls it a \"Widget\" and another calls it a \"Component\" and the identifiers don't match across the codebase.\n\n**Example:** `../examples/glossary.md`\n\n---\n\n### Engineering Plan\n\n**What:** Phased build sequence with explicit exit criteria per phase. Each phase specifies: what to build, what tests must pass, what must NOT exist yet (scope boundaries), and what artifacts must be produced.\n\n**Why:** Building all phases simultaneously produces breadth without depth. Phased execution with exit criteria ensures each layer is solid before the next one goes on top.\n\n**Key feature:** Exit criteria are verifiable, not subjective. Not \"the API should work\" but \"the `/api/v1/orders` endpoint returns 200 with a valid `CreateOrderResponse` body and 400 with `ErrorResponse` for invalid input.\"\n\n---\n\n## Feedback Artifacts\n\nThese are produced during validation and implementation.\n\n### FINDINGS.md\n\n**What:** A log of every spec ambiguity, contradiction, and guess encountered during implementation.\n\n**Why:** The primary feedback mechanism from code back to spec. The postmortem reviews FINDINGS.md and feeds fixes into the spec as WHY blocks, TONIC entries, or updated Decided entries. Each implementation run tightens the spec for the next one.\n\n**Contents:** For each finding:\n- What the agent guessed or got confused about\n- What the spec said (or failed to say)\n- The agent's proposed fix or the correct answer\n- **Target artifact:** Where the fix belongs (constitution, conventions, domain doc, glossary, TONIC table, or new OQ). This prevents findings from sitting in a log without re-entering the spec. A finding without a target artifact is an observation, not a fix.\n\n---\n\n### Edge Case Catalog\n\n**What:** A systematic catalog of edge cases from cross-domain interaction surfaces. Each classified by response level (CODE/DEGRADE/ALERT/ACCEPT) and indexed to affected domains.\n\n**Why:** Edge cases emerge from interactions between domains, not individual components. A lifecycle state machine works fine in isolation. Combined with presence state transitions, IPC timing, and resource boundaries, edge cases appear at the interaction surfaces.\n\n**Key feature:** CODE-level cases become implementation exit criteria. ACCEPT-level cases are documented but not coded for -- don't add hot-path latency for scenarios occurring less than once per million operations.\n\n**Example:** `../examples/edge_cases.md`\n\n---\n\n### Exclusion Index (if using automated validation)\n\n**What:** A single file documenting every approved false positive from automated validation, with enough context to audit and self-invalidate.\n\n**Why:** Automated tools produce false positives. Suppressing them by weakening detection rules causes real violations to be missed. Ignoring them makes validation output noisy and unusable. The exclusion index solves both problems.\n\n**Key feature:** Self-invalidation. On every validation run, the parser checks each exclusion entry -- does the file still exist? Does the line still contain the term? Does the rule still flag this location? Stale exclusions are reported automatically.\n\n**Reference:** `../reference/false_positive_exclusion_index.md`\n\n---\n\n## The Artifact Hierarchy\n\n```\nConstitution Immutable laws. Rarely changes.\n |\nConventions Implementation choices. Changes when experience reveals better patterns.\n |\nDomain Docs Domain specifications. Changes when design discussions reach conclusions.\n |\nEngineering Plan Build sequence. Changes when scope or priorities shift.\n |\nFINDINGS.md Feedback from implementation. Continuously appended.\n |\nEdge Case Catalog Cross-domain interaction analysis. Built after architecture is substantively complete.\n```\n\nEach layer serves a different purpose and changes at a different rate. When documents conflict, higher layers override lower ones.\n",
"teams": "# Forge for Teams\n\nThe methodology was developed by a solo practitioner, but it scales naturally to teams. In some ways it works better with multiple people than solo, because the artifacts that seem like overhead for one person become coordination mechanisms for a group.\n\n---\n\n## Domain Ownership\n\nDomain docs partition along ownership lines. The person who owns payment integration owns the Order Service domain doc. The person who owns identity owns the User Service doc. Each person needs domain expertise only in their area.\n\nThis changes the \"read everything\" requirement from daunting to practical. Nobody reads 300 pages with equal attention. The payments person reads their 30 pages deeply and skims the rest for cascade impacts on their domain. The cold session reads all 300 pages with equal attention, which is something no human team member will ever do. The model compensates for the team's selective attention. The team compensates for the model's lack of judgment about what actually matters.\n\nIf you own the code, you own the doc. If the doc is stale, the next cold session will find it. Domain ownership is spec ownership.\n\n---\n\n## Decision Tracking as Coordination\n\nThe Decided/Open Questions tracking gains a second purpose on a team. On a solo project it's a maturity signal. On a team it's a coordination mechanism.\n\nAn Open Question in Domain 02 blocked on a decision in Domain 01 is a dependency between two people, not just two documents:\n\n```\nOQ-3: Authentication token format.\nBlocked on: Domain 01 auth decision (Sarah).\n```\n\nThe status annotations become a lightweight coordination layer without needing a separate tracking tool. When Sarah resolves the auth decision, she updates Domain 01 and tells the Domain 02 owner to update the reference.\n\nThis isn't replacing your ticket system. It's capturing a category of dependency that ticket systems are bad at: cross-domain architectural dependencies where the blocker is a design decision, not a task.\n\n---\n\n## Cascade Tags as Team Communication\n\nCascade tags serve the same function at a structural level. When someone changes something in Domain 05 and the cascade tag says \"Cascade: Domains 03, 08, 11,\" those are three specific people who need to review the change, not just three documents.\n\nOn a solo project, cascade tags remind you what to update. On a team, they tell you who to notify. The tag is a notification list.\n\nAt 15+ domains with 10+ team members, manual cascade notification breaks down the same way manual cascade checking does. That's where graph tooling earns its keep. \"I changed the contract schema in Domain 02. What breaks?\" The graph answers with a list of affected domains, which maps to a list of people.\n\n---\n\n## The Hot/Cold Loop on a Team\n\nThe cold session doesn't care who answers. It asks a question about the payment flow, the payment owner answers and fixes the doc. It asks about the auth model, the auth owner answers and fixes that doc. The hot session becomes a team conversation rather than a solo one.\n\nIn practice, one person can run the cold session and route questions to the appropriate domain owners. The cold session's questions are the agenda for the team's spec review. No separate meeting needed to figure out what to discuss; the cold session already surfaced the gaps.\n\nThis can be asynchronous. Run the cold session, collect the questions, post them in Slack or wherever the team communicates, tag the relevant domain owners. Each person fixes their doc. Run the cold session again to verify.\n\n---\n\n## Onboarding\n\nNew team members read the spec for their domain (not the whole corpus), read the constitution and conventions (short documents that apply everywhere), and start contributing. The spec contains enough context that a new person understands not just what the system does but why it does it that way:\n\n- **WHY blocks** explain the reasoning behind non-obvious decisions so the new person doesn't \"improve\" something by removing a constraint they don't understand\n- **Decided entries with session references** trace decisions back to the discussions that produced them, giving the new person the full reasoning chain\n- **TONIC entries** prevent the new person from reaching for their preferred library instead of the project's chosen one\n- **Terminology enforcement** prevents the new person from introducing synonyms that fragment the codebase vocabulary\n- **Open Questions with status annotations** tell the new person what's settled and what's still being figured out, so they don't build on assumptions that haven't been finalized\n\nWithout the spec, onboarding means reading code, asking questions, getting incomplete answers from people who are busy, and slowly building mental models that may or may not match the original architects' intent. With the spec, the new person has the full picture on day one. Not because they read everything, but because the spec was written precisely enough that what they read is accurate and complete for their domain.\n\nAfter onboarding, the new team member can run their own cold validation against the domains they'll own. Their questions surface the gaps in their understanding, which are also gaps in the doc.\n\n---\n\n## Preventing Style Wars\n\nThe conventions doc is the team's truce document. Every contested idiom has a position with a WHY block. \"We use chi, not gin. Here's why.\" When a team member pushes back (\"but gin is more popular\"), the WHY block already contains the reasoning. They can disagree with the reasoning, but they can't claim the decision wasn't considered.\n\nTONIC entries serve the same purpose. They're not just for AI agents; they're for any team member whose instinct is to reach for the ecosystem default. \"I know you want to use Kafka. We use NATS JetStream. Here's why.\"\n\nThis also prevents the revolving-door problem where each new team member introduces their preferred libraries and patterns. The conventions doc is the answer to \"why do we do it this way?\" and the TONIC table is the answer to \"why don't we use X instead?\"\n\n---\n\n## Review Process\n\nThe cold validation loop integrates into a team's review process:\n\n### Before a PR that Changes Architecture\nRun a cold session against the updated spec. If the cold session asks questions about the changed area, the spec change isn't precise enough. Fix the spec before merging the code.\n\n### Sprint/Iteration Boundaries\nRun a cold validation against the current spec. Route questions to domain owners. Fix docs. This replaces or supplements architecture review meetings. The cold session is more thorough than any human reviewer and doesn't skip the boring parts.\n\n### After Onboarding\nNew team member runs their own cold validation against the domains they'll own. Their questions are the onboarding gaps. Fix the docs, and the next person's onboarding is smoother.\n\n### Quarterly Spec Health Check\nFull cold validation + different model. How much has the spec drifted from reality? Which domains need reconciliation? This is the batch reconciliation from Phase 8 (see `forge_process.md`) run as a team activity.\n\n### Code Review Supplement\nWhen reviewing a PR, check it against the relevant domain doc. Does the code match the spec? If not, either the code is wrong or the spec needs updating. This is lighter than a full cold validation but catches drift in real time.\n\n---\n\n## The Cold Session as the Honest Reviewer\n\nOn any team, there's social pressure around code reviews. Junior members defer to senior ones. Friends give each other easy reviews. Nobody wants to be the person who blocks the sprint. The review becomes a rubber stamp.\n\nThe cold session has no politics. It doesn't defer to seniority. It doesn't care about the sprint deadline. It reads every word and asks about every gap. When it says \"this spec is ambiguous about the retry strategy,\" it doesn't matter who wrote the spec or how senior they are. The ambiguity exists.\n\nThis isn't a replacement for human code review. It's a complement. The cold session catches spec-level issues (ambiguity, contradictions, missing decisions). Human reviewers catch judgment issues (is this the right approach, does this match our product goals, is this over-engineered).\n\n---\n\n## Scaling\n\nThe methodology scales with the team, but the tooling needs change:\n\n| Team Size | What Changes |\n|-----------|-------------|\n| 2-3 people | Manual cascade checking works. One person can run cold sessions. Conventions doc prevents inconsistency. |\n| 4-8 people | Domain ownership matters. Cascade tags become notifications. Cold sessions run at sprint boundaries. Glossary prevents vocabulary drift. |\n| 8-15 people | Graph tooling starts paying off. Cross-domain dependencies need tracking. Multiple people run cold sessions for their domains. Constitution prevents architectural drift. |\n| 15+ people | Graph tooling is essential. Cold sessions are automated or scheduled. Spec drift reconciliation is a regular process. The spec is the coordination layer, not meetings. |\n\n---\n\n## What the Team Lead Needs to Know\n\n- **The spec is the coordination artifact.** Decisions live in the domain docs, not in ticket comments or Confluence pages that nobody reads.\n- **Domain ownership means spec ownership.** If you own the code, you own the doc. If the doc is wrong, the next cold session (or the next developer) pays the price.\n- **The cold session is the honest reviewer.** Use it. It doesn't play politics, it doesn't defer, it doesn't skip the boring parts.\n- **Start with conventions.** The single highest-impact artifact for a team is a conventions doc with TONIC entries. Consistency improves on the first session. Everything else can follow.\n- **The methodology doesn't add meetings.** Cold validation replaces architecture review meetings, not adds to them. The cold session surfaces the agenda. Domain owners fix the gaps asynchronously.\n- **New people ramp faster.** The spec captures context that would otherwise take weeks of questions and code reading. Onboarding cost drops significantly.\n",
"training": "# Training Your Team on Forge\n\nThe methodology docs tell you what Forge is. This document is about getting people to actually do it well. The hardest part isn't the process. It's the vocabulary shift.\n\n---\n\n## The Vocabulary Problem\n\nEvery cold read of the Forge docs focuses on the artifacts: conventions docs, WHY blocks, TONIC tables, cold validation. Those are the mechanics. What none of them mention is the prerequisite that makes all of it work: the ability to communicate with precision.\n\nMost developers (and most non-developers) write specifications the way they talk. \"The system should handle errors gracefully.\" \"The API needs to be fast.\" \"Make sure the data is secure.\" These sentences feel like they say something. They don't. They're empty calories. An AI agent reading them will fill \"gracefully\" and \"fast\" and \"secure\" with whatever its training data suggests, which may be completely wrong for your project.\n\nForge works when people write with precision. \"The API returns a 429 with a Retry-After header when the rate limit of 100 requests per minute per API key is exceeded.\" That's a sentence an agent can implement without guessing. Getting people to write like that is the training challenge.\n\n---\n\n## Why It Matters More Than the Process\n\nYou can follow every step of the Forge pipeline perfectly and still produce a bad spec if the language is imprecise. The cold validation session will catch it, but you'll burn cycles fixing language problems that shouldn't have been there in the first place.\n\nConversely, a team that writes with precision but follows a lightweight process (just a conventions doc and some WHY blocks) will produce better outcomes than a team that runs the full pipeline with vague language. The vocabulary is the foundation. The process amplifies it.\n\nThis is why the ambiguous language dictionary exists. Not as a style guide, but as a training tool. Run it against a new team member's first spec contribution. The flags aren't criticism; they're calibration. \"You wrote 'handle gracefully.' What does graceful mean here? What does the user see? What gets logged? What state does the system end up in?\" After a few rounds of that, the habit forms.\n\n---\n\n## How to Train\n\n### Start with the Why\n\nDon't open with \"here's a 31-category dictionary of words you can't use.\" Open with the problem. Show them what happens when an AI agent encounters vague language.\n\nGive them a vague spec and a precise spec for the same feature. Ask them to give both to their AI assistant and compare the output. The difference is immediate and visceral. The vague spec produces code that sort of works but makes wrong assumptions. The precise spec produces code that does what you actually wanted.\n\nThat demonstration does more than any training doc. They see the cost of imprecision in their own code, not in a theoretical example.\n\n### The Vocabulary Ladder\n\nDon't ask people to go from casual writing to mechanism-focused precision overnight. There's a ladder:\n\n**Level 1: Remove the obvious offenders.** \"Should\" becomes \"must\" or \"may.\" \"Handle gracefully\" becomes a specific behavior. \"As needed\" becomes a specific condition. These are the words from the flagged vocabulary that cause the most damage with the least effort to fix. A week of practice at this level changes the baseline.\n\n**Level 2: Add specificity to nouns.** \"The user\" becomes \"the authenticated user\" or \"the anonymous visitor.\" \"The system\" becomes \"the OrderService\" or \"the payment webhook handler.\" \"The data\" becomes \"the JSON payload conforming to CreateOrderSchema.\" Nouns are where actors get confused and responsibilities get blurred.\n\n**Level 3: Quantify everything quantifiable.** \"Fast\" becomes \"under 200ms at p99.\" \"Many\" becomes \"up to 10,000.\" \"Recently\" becomes \"within the last 24 hours.\" If a number exists, use it. If a number doesn't exist, decide on one and document the WHY.\n\n**Level 4: Name mechanisms, not outcomes.** \"Separate things so they don't affect each other\" becomes \"fault containment via process isolation.\" \"Check if things are working\" becomes \"heartbeat-based liveness detection.\" This is the precision that compounds. A team writing at this level produces specs that cold sessions rarely question.\n\nMost people plateau at Level 2-3 and that's fine. Level 4 comes from domain expertise and practice. Don't force it. The jump from Level 0 (casual writing) to Level 2 (specific nouns and conditions) is where 80% of the value is.\n\n### Code Review as Vocabulary Training\n\nThe fastest way to train vocabulary precision is to add it to code review. Not as a separate process. As part of reviewing PRs that touch specs.\n\nWhen someone writes \"the service handles the error,\" the review comment isn't \"be more specific.\" It's: \"Which service? What error? What does 'handle' mean here? Does it retry, log, alert, return a default, or crash?\" Ask the questions the cold session would ask. After a few PRs, people start anticipating the questions and writing precisely the first time.\n\n### Run the Dictionary Against Their First Spec\n\nWhen a new team member writes their first domain doc or conventions entry, run the ambiguous language dictionary against it. Not as a test. As a learning exercise. Walk through the flags together: \"This word was flagged. Is it genuinely ambiguous here, or is it fine in context?\" The discussion is the training. After one pass, they understand what precise spec language looks like far better than any doc could explain.\n\n### Cold Validation as Training\n\nHave new team members run their own cold validation (see `cold_validation_protocol.md`) against the domains they'll own. Their questions are their learning gaps. The cold session surfaces what they don't know or what the docs don't communicate clearly enough. Both are valuable: the first is onboarding, the second is doc improvement.\n\n---\n\n## Common Resistance and How to Handle It\n\n### \"This slows me down.\"\n\nIt does, at first. Like any skill, precise writing takes more effort before it becomes habit. The payoff is that every session after the first reads a spec that doesn't need clarification. The time saved on the back end exceeds the time spent on the front end after about two cycles.\n\nShow them: run a cold session against their vague spec. Count the questions. Estimate the time to answer them and fix the docs. Now run a cold session against the precise version. Count the questions. That delta is the cost of imprecision, and it multiplies across every session.\n\n### \"I know what I mean.\"\n\nYou do. The AI doesn't. Neither does the new hire who joins in six months. Neither does the you from six months ago who's forgotten the context. The spec isn't for you right now. It's for every future reader who doesn't have your current context.\n\n### \"This is over-engineering.\"\n\nFor a personal project, maybe. For a team project with multiple AI sessions, multiple contributors, and production stakes, imprecise specs are under-engineering. The spec is the cheapest place to fix a problem. Code is the most expensive. The further downstream ambiguity travels, the more it costs.\n\n### \"I'm not a writer.\"\n\nYou don't need to be. You need to be specific. \"The API returns 200 with a JSON body containing user_id and created_at\" isn't beautiful prose. It's a clear instruction. Nobody is asking for elegance. They're asking for precision.\n\n---\n\n## For Non-Technical Team Members\n\nProduct managers, business analysts, domain experts, and other non-technical contributors produce some of the most impactful spec content because they carry the domain knowledge that developers don't have. They also tend to write the most imprecise specs because they're used to communicating intent, not implementation.\n\nThe training for non-technical contributors is the same ladder, just with different examples:\n\n- \"The onboarding flow should be intuitive\" becomes \"new users complete account setup in under 3 minutes without contacting support, measured by time from first page load to profile completion\"\n- \"We need good reporting\" becomes \"the monthly report includes: total orders, revenue by category, refund rate, and top 10 customers by spend. Format: PDF, emailed to finance@company.com on the 1st of each month\"\n- \"The search should be fast\" becomes \"search results appear within 500ms of the user stopping typing, displaying up to 10 results ranked by relevance\"\n\nNon-technical contributors don't need to know what p99 latency means. They need to specify what \"fast\" means to their users. They need to enumerate what \"good reporting\" includes. They need to define what \"intuitive\" looks like in measurable terms.\n\n---\n\n## Measuring Progress\n\nYou can't measure vocabulary precision directly, but you can measure its effects:\n\n- **Cold session question count per domain.** Track this over time. As the team's writing improves, cold sessions ask fewer questions per spec contribution. If the count isn't declining, the training isn't landing.\n- **TONIC error rate in cold code runs.** How often does the AI reach for the wrong library or pattern? A declining rate means the conventions doc is getting more precise.\n- **Time-to-first-cold-validation.** How long does it take from \"Forge it\" to a cold session that produces a plan without architectural questions? Shorter is better. It means the initial formalization is higher quality.\n- **WHY block coverage initiated by contributors (not requested by reviewers).** When team members start adding WHY blocks proactively rather than being asked in review, the training has taken hold.\n\n---\n\n## The Principle\n\nThe Forge process is a machine. Vocabulary precision is the fuel. You can build the most sophisticated validation pipeline in the world, but if the specs going into it say \"the system should handle errors appropriately,\" the pipeline will spend all its time catching language problems instead of finding architectural ones. Train the vocabulary first. The process follows.\n",
"adoption": "# Adoption Guide\n\nHow to adopt Forge, how much to adopt, and in what order. Includes maturity tiers for the methodology itself so you know what's battle-tested and what's reasoned extrapolation.\n\n---\n\n## Maturity Tiers\n\nNot everything in Forge has the same validation depth. The methodology is honest about this. Here's what's proven, what's scaled, and what's projected.\n\n### Tier 1: Proven Core\n\nThese practices have been through hundreds of hours of real project work across multiple large-scale systems (17-20 domains, 300+ page specs) with multi-model validation. They work.\n\n- **Conventions doc** with TONIC entries\n- **WHY blocks** on non-obvious decisions\n- **Decision tracking** (Decided/Open with status annotations)\n- **Cold validation** with the hot/cold feedback loop\n- **Inline ambiguity markers** ([OQ-N])\n- **Vocabulary precision** as an engineering practice\n- **The core loop:** formalize, validate cold, fix, repeat\n\n**Confidence:** High. These exist because specific failure modes occurred without them, and the failure modes stopped occurring with them.\n\n### Tier 2: Validated at Scale\n\nThese practices have been used on large projects and produce measurable results, but they require more infrastructure and discipline to maintain.\n\n- **Constitution** (immutable architectural laws)\n- **Glossary** with terminology enforcement\n- **Cold code runs** as validation probes\n- **Multi-model validation** (cross-model convergence)\n- **Cascading consistency** (manual and matrix levels)\n- **Semantic review** (dictionary lint)\n- **Edge case classification** (CODE/DEGRADE/ALERT/ACCEPT)\n- **FINDINGS.md** feedback loop\n- **Spec drift reconciliation**\n\n**Confidence:** High. All developed through practice, though some (like multi-model validation and semantic review) were added later in the methodology's evolution and have been through fewer iteration cycles than Tier 1.\n\n### Tier 3: Scaled by Extrapolation\n\nThese practices are reasoned extensions of Tier 1 and Tier 2 principles. The logic is sound: they combine proven mechanisms in ways that follow naturally. But they haven't been through the same grind of repeated practice and failure-driven refinement.\n\n- **BackForging** (reverse-engineering specs from code)\n- **Graph tooling** (Neo4j/networkx dependency tracking)\n- **Multi-repo Forge** (cross-project graph)\n- **Forge at organizational scale** (Forge of Forges)\n- **Custom graph signals** (project-specific validation)\n- **Semantic search layer** (Weaviate/vector embeddings)\n\n**Confidence:** These extrapolations are backed by 30+ years of systems and software engineering experience. The underlying patterns (reverse-engineering specs from running systems, cross-project dependency tracking, organizational consistency enforcement) are decades old and battle-tested across engineering disciplines. What's new is their application through Forge's specific artifact templates. BackForging in particular combines two proven mechanisms (spec derivation from code + cold validation of specs) in a way that's almost ontological: if you can derive a spec and you can validate a spec, combining them produces validated specs from code. The approach is sound. The templates will sharpen through community practice.\n\n**What this means for you:** Start with Tier 1. Add Tier 2 when Tier 1 is habitual. Explore Tier 3 when the project justifies it. Don't adopt Tier 3 practices and expect Tier 1 reliability.\n\n### The Space Shuttle Analogy\n\nA Forge spec isn't complete when it contains everything. It's complete when every relevant decision is in one of three states: included inline, referenced at a canonical source, or explicitly excluded with a reason.\n\nConsider the Space Shuttle. Every fastener has a part number, a material specification, a torque tolerance, an inspection procedure, and a rationale. The spec doesn't contain the metallurgy of every bolt; it references the material standard and constrains which standards are acceptable. The reference graph is the artifact. Nobody would accept \"we just sourced whatever bolts were handy.\" But do they specify the level of polish on a plastic knob? Probably not, unless that knob needs to be visible under specific cabin lighting conditions, in which case, yes, they absolutely specify it.\n\nThat's stakes-driven specification depth. The methodology doesn't change between the shuttle and a weekend project. The dial changes. A weekend project explicitly excludes most of what the shuttle includes, and that's correct. The explicit exclusion is itself a decision, made with knowledge of what was being left out, rather than an oversight. The difference between \"we didn't specify torque tolerance\" and \"we chose not to specify torque tolerance because nothing load-bearing depends on it\" is the difference between debt and deliberate scope.\n\nThis scales all the way down. A vibe-coded personal tool might explicitly exclude 90% of what a production system would include. Fine. The point is that the omission is deliberate, not accidental. Forge makes the distinction visible: what was decided, what was deferred, and what was consciously left to the implementer's judgment. Nothing is left to chance except what you deliberately chose not to specify, and sometimes that's exactly right.\n\n---\n\n## Order of Operations for Adoption\n\nThe order matters. Each step builds on the previous one. But the exact sequence also depends on your team, your codebase, and your situation.\n\n### For a New Project (Greenfield)\n\n```\nDay 1: Conventions doc (lock your stack, forbidden alternatives, TONIC table)\nDay 1-2: Constitution (3-5 immutable laws, as they become apparent)\nWeek 1: Domain docs for your first 2-3 domains (Decided/Open, WHY blocks)\nWeek 1: Glossary (terms will emerge naturally from domain docs)\nWeek 2: First cold validation session (the questions are your roadmap)\nWeek 2+: Hot/cold feedback loop until questions narrow\nWhen ready: Cold code run (disposable probe)\nOngoing: FINDINGS.md captures implementation discoveries\n```\n\n### For an Existing Codebase (BackForge)\n\n```\nDay 1: Conventions doc (extract from code what you're already using)\nDay 1: Have the AI read the codebase and produce a draft spec\nDay 2-3: Review the draft, add present-tense WHY blocks\nDay 3: Constitution (extract the load-bearing rules from the draft)\nDay 3: Glossary (extract terms the codebase uses)\nWeek 1: Cold validation against the BackForged spec\nWeek 1+: Hot/cold loop to close gaps between spec and reality\nWeek 2: Evaluate: rebuild, refactor, or maintain as-is?\n```\n\n### For a Vibe-Coded Project (MiniForge to Forge)\n\n```\nMinute 1-5: MiniForge assessment (five questions, one file)\n If GREEN: keep miniforge.md, you're done unless it grows\n If YELLOW/RED: continue below\nDay 1: Conventions doc (the AI extracts your current choices)\nDay 1: Constitution (3-5 hard rules you already know)\nWeek 1: WHY blocks on the decisions that keep coming up\nWhen ready: Try a cold validation on the trickiest part\n```\n\n### Adaptation Factors\n\nThe order above is a starting point, not a prescription. Adjust based on:\n\n- **Team dynamics:** If your team resists process, start with only the conventions doc. Let the consistency improvement sell the rest. Don't mandate WHY blocks on day one if people will see it as bureaucracy.\n- **Codebase maturity:** A mature, stable codebase needs BackForge focused on documentation, not restructuring. A messy codebase needs BackForge focused on finding decisions that can't be justified.\n- **Future roadmap:** If a major refactor is planned, invest more in the spec upfront. If the project is in maintenance mode, a lighter touch (conventions + constitution) is enough.\n- **Stakes:** Medical, financial, or compliance-driven projects should reach Tier 2 practices faster. Personal tools can stay at Tier 1 indefinitely.\n- **Team size:** Solo developers can skip the team coordination artifacts (cascade tags as notification lists, domain ownership). Teams of 4+ need them.\n\n---\n\n## The First 48 Hours (Quick Start)\n\nThis is what immediate adoption looks like. Two days, tangible results, minimal commitment.\n\n**Critical framing:** The AI creates and maintains the Forge artifacts, not your team. Your team's job is to have design conversations, make decisions, review what the AI produces, and challenge what's wrong. Nobody is spending days hand-writing a constitution or a conventions doc. You tell the AI what you decided and it produces the document. You review it. That's the time investment. If your team is spending weeks writing Forge docs by hand, they're doing it wrong.\n\nWhat you'll have after two days of Forge adoption, and the tangible difference you'll see.\n\n### After Day 1\n\n**You'll have:**\n- A conventions doc with your stack choices and forbidden alternatives\n- A short constitution (3-5 articles)\n- A start on your first domain doc\n\n**The tangible difference:**\n- Your next AI session reads the conventions doc first. It doesn't switch your libraries. It doesn't introduce a new ORM. It uses the naming patterns you specified. The output is consistent with yesterday's output for the first time.\n\n### After Day 2\n\n**You'll have:**\n- 2-3 domain docs with Decided entries and WHY blocks\n- A glossary with your project's key terms\n- (Optionally) results from your first cold validation session\n\n**The tangible difference:**\n- A new AI session reads your domain docs and produces a plan. It doesn't re-open decisions you already made. It doesn't \"optimize\" away a constraint you put there for a reason. When it encounters something uncertain, it asks instead of guessing, because the methodology tells it to. If you ran a cold validation, you have a list of specific gaps in your thinking that you didn't know existed.\n\n### The Test\n\nAfter one week: are your AI sessions producing less stack drift, fewer silent assumption changes, and fewer repeated architectural arguments?\n\nIf yes, keep expanding.\nIf no, you've adopted the wrong parts or the project doesn't need Forge. Either way, you invested two days, not two months.\n\n---\n\n## When Discipline Degrades\n\nIt will. Deadlines override process. People skip steps. Docs drift. Ownership blurs. This is normal, and Forge has a self-correcting mechanism.\n\n**The cold session catches degradation.** When the spec gets stale, a cold validation session produces more questions. When conventions drift, cold code runs produce TONIC errors. When cascade tags aren't updated, the graph reports orphaned references. The validation tools don't care about your deadline. They read the docs literally and report what's wrong.\n\nThe severity of degradation maps to the volume of cold session questions:\n\n- **Mild drift:** Cold session asks 2-3 detail questions. Fix them, move on.\n- **Moderate drift:** Cold session asks 10+ questions, some architectural. The spec needs a reconciliation pass.\n- **Severe drift:** Cold session can't produce a coherent plan. The spec no longer describes the system. Stop and reconcile before building anything new.\n\n**The recovery protocol:**\n1. Run a cold validation against the current spec\n2. Sort questions by severity (architectural vs detail)\n3. Fix the spec, starting with the architectural issues\n4. Re-run cold validation\n5. When questions narrow to details, you're recovered\n\nThis is the same process as initial validation, just applied to a spec that's drifted. The methodology doesn't break when discipline lapses. It just costs more to recover the longer you wait.\n\n---\n\n## Depth Escalation\n\nForge does not require pseudocode-level specification. The methodology is selectively precise: precise on decisions, precise on constraints, precise where ambiguity caused real errors. Intentionally not precise everywhere else. That selectivity is the economic advantage.\n\nBut some components resist specification at the normal level. If repeated cold validation or cold code runs produce divergent implementations in a specific area, that's a signal to increase precision there, possibly to the level of interfaces, state machines, or pseudocode.\n\n**Where deeper specification helps:**\n- Safety-critical or compliance-heavy components\n- Complex, high-coupling logic (parsers, schedulers, financial calculations)\n- Performance-critical hot paths where behavior must be exact\n- Areas where multiple cold runs consistently guess differently\n\n**Where it breaks things:**\n- Applied globally, it explodes upfront cost and makes the spec harder to maintain than code\n- Engineers disengage because it feels like writing code twice\n- It shifts from preventing wrong guesses to eliminating all guessing, which isn't the goal\n- It creates false confidence (\"we specified everything\") while reducing adaptability\n\n**The rule:** Only increase precision where the system proves you need it. Precision grows from observed ambiguity, not from upfront over-specification. If you find yourself specifying every function signature before writing code, you've crossed from Forge into formal specification, which is a different discipline with different costs.\n\n---\n\n**Prevention:** The cheapest prevention is a regular cold validation cadence. Monthly for active projects. Quarterly for maintenance-mode projects. The cold session is the canary: if it's asking more questions than last time, discipline is degrading.\n",
"operational_guide": "# Operating a Forged Project\n\nYou have a Forged spec. The cold sessions are clean. Implementation is underway or already shipped. Now what? This document covers the day-to-day practice of living with Forge: how to handle feature work, design changes, library swaps, bug fixes, and the judgment calls about what needs to go through the pipeline and what doesn't.\n\n---\n\n## The Decision: Does This Change Need Forging?\n\nNot every change touches the spec. The question is whether the change could cause a cold session to produce different output than it would have before the change.\n\n### Forge it (update the spec first, then build)\n\n- **New feature.** A new capability that doesn't exist in the spec. Even if it's \"small,\" it introduces decisions the spec doesn't cover. What's the error handling? Where does it sit in the domain hierarchy? What cascade impacts does it have?\n- **Significantly altered feature.** Changing the behavior of something the spec describes. The spec says \"retry 3 times with exponential backoff.\" You want to change it to \"retry once then fail.\" That's a spec change.\n- **Library or dependency swap.** Replacing one library with another. The conventions doc and TONIC table need updating. If you swap from chi to gin, that's not just a code change; it changes middleware compatibility across the entire project.\n- **New integration point.** Connecting to a new external service, database, or API. This creates cascade dependencies, possibly new domain docs, certainly new Decided entries.\n- **Schema change.** Altering a database schema, API contract, or data format. Anything downstream that reads the old format will break. The cascade tags exist for exactly this.\n- **Security model change.** Changing auth, permissions, encryption, or trust boundaries. Constitution-level implications.\n- **Architecture refactor.** Moving responsibilities between domains, splitting or merging services, changing the process model.\n\n### Don't Forge it (just build, reconcile later)\n\n- **Bug fixes** that don't change specified behavior. The spec says \"return 404 for missing users.\" The code was returning 500. Fixing it to 404 matches the spec. No spec change needed.\n- **Cosmetic changes.** Color, font, spacing, copy tweaks that don't affect behavior.\n- **Performance optimization** that doesn't change behavior. The spec says \"return results.\" You added an index to make it faster. Same behavior, different speed.\n- **Dependency patches.** Updating a library to a new patch version with no API changes.\n- **Internal refactors** that don't change module boundaries or public interfaces. Renaming a private function, restructuring internal logic, cleaning up dead code.\n\n### Gray zone (use judgment)\n\n- **Adding a configuration option.** If it changes behavior based on user input, probably Forge it. If it's a performance tuning knob with a sensible default, probably not.\n- **UI behavior changes.** A new loading state, a changed error message, a different sort order. If it affects what the user sees and does, consider Forging. If it's polish, don't.\n- **Test changes.** Adding or changing tests usually doesn't need spec changes. Unless the test reveals that the spec is wrong.\n\n**The litmus test:** If a cold session reading the current spec would produce code that conflicts with your change, the spec needs updating. If the cold session would produce the same code regardless, the spec is fine.\n\n---\n\n## Feature Work\n\nWhen adding a new feature to a Forged project:\n\n### 1. Check the spec first\n\nBefore writing any code, read the relevant domain docs. Does the spec already address this feature? Is it an Open Question marked for this phase? Is it deferred to post-V1? Does it conflict with a constitutional article?\n\n### 2. Update the spec\n\nIf the feature isn't in the spec:\n- Add it to the appropriate domain doc\n- Create Decided entries for the key choices\n- Add WHY blocks where alternatives exist\n- Check cascade tags: does this feature affect other domains?\n- Update the conventions doc if new libraries or patterns are introduced\n- Add glossary entries for new terms\n\n### 3. Run a lightweight cold check\n\nYou don't need a full cold validation for every feature. But consider asking a fresh session: \"Read this domain doc. I'm adding [feature]. Does anything in the spec conflict with this or leave you unsure how to implement it?\"\n\nIf it asks questions, the spec has gaps. Fix them.\n\n### 4. Build\n\nWith the spec updated, build the feature. Any ambiguities encountered go into FINDINGS.md. After implementation, any findings feed back as spec fixes.\n\n### 5. Update cascade impacts\n\nIf the feature touched concepts referenced by other domains, update those domains. If you have cascade tags, they tell you exactly which ones. If you have graph tooling, run a cascade analysis.\n\n---\n\n## Interface Design Changes\n\nWhen a designer hands you new wireframes, a redesigned flow, or updated components:\n\n### What to Forge\n\n- **New user flows** (onboarding, checkout, settings). These introduce decisions about error states, loading states, empty states, permission states, keyboard navigation, and accessibility that the spec should capture.\n- **Changed interaction models** (drag-and-drop where there was click, swipe where there was scroll). The behavioral specification needs updating.\n- **New components** with state. A modal, a multi-step form, a data table with sort/filter. Each has behavioral decisions the spec should cover.\n- **Responsive behavior changes.** New breakpoints, changed layout at specific widths, different mobile behavior.\n\n### What not to Forge\n\n- **Visual-only changes** within existing components. New colors, fonts, spacing, illustrations. These are design system changes, not spec changes.\n- **Copy changes** that don't affect behavior. New button labels, updated error messages, reworded headings.\n\n### How to spec interface changes\n\nThe spec doesn't describe what the interface looks like. It describes what it does. Point at the design artifacts (Figma, wireframes, CSS) for the visual. Spec the behavior:\n\n- What happens on error?\n- What does the loading state look like and how long before it appears?\n- What does the empty state show?\n- How does keyboard navigation work?\n- What happens at each breakpoint?\n- What are the accessibility requirements?\n\nSee the UI/UX ambiguous language dictionary in `../reference/` for the categories of interface behavior that need specifying.\n\n---\n\n## Library and Dependency Changes\n\n### Swapping a library\n\nThis is a conventions doc change. Update the canonical dependency entry, update the TONIC table (the old library becomes a forbidden alternative), and add a WHY block explaining why you switched.\n\nIf the new library changes the API surface or behavioral contracts, that's also a domain doc change. A new ORM doesn't just change the conventions; it might change how queries are structured, how migrations work, and how transactions are handled.\n\n### Upgrading a major version\n\nIf the upgrade introduces breaking API changes, treat it like a library swap. If it's a non-breaking upgrade, just do it.\n\n### Adding a new dependency\n\nAdd it to the conventions doc with the forbidden alternatives (if any). If it introduces a new capability the spec doesn't describe, add the capability to the relevant domain doc.\n\n---\n\n## Reconciliation Cadence\n\nCode-driven changes (bug fixes, performance patches, dependency updates) don't go through the Forge pipeline. They accumulate. Periodically, reconcile them against the spec.\n\n### When to reconcile\n\n- **At release boundaries.** Before a release, check: does the spec still describe what we're shipping?\n- **At sprint/iteration boundaries.** A lightweight check: did any code changes this sprint alter behavior the spec describes?\n- **When someone notices.** If a cold session starts asking questions about things that used to be clean, the spec has drifted.\n- **After a production incident.** If the incident revealed a behavior the spec didn't anticipate, add it. If the fix changed behavior the spec describes, update it.\n\n### How to reconcile\n\nGive an AI session the spec and a summary of changes since the last reconciliation (git log, PR list, changelog). Ask: \"Which of these changes conflict with or extend the spec?\" The output is a list of spec updates needed. Apply them.\n\n---\n\n## Working with Designers\n\nDesigners produce visual artifacts. Forge produces behavioral specifications. The two complement each other:\n\n- **The designer decides** what the interface looks like: layout, color, typography, spacing, visual hierarchy.\n- **The spec decides** what the interface does: error states, loading behavior, keyboard navigation, responsive breakpoints, accessibility, state transitions, empty states, permission states.\n\nWhen a design change arrives:\n\n1. Review the visual for any behavioral implications\n2. Ask: \"What happens when this fails? When it's loading? When there's no data? When the user has no permission? When they're on mobile?\"\n3. If the answers aren't in the spec, add them\n4. If the answers conflict with existing spec, resolve the conflict explicitly\n\n---\n\n## Working with Product Managers\n\nProduct managers define what to build. Forge defines how to build it precisely. When a PM requests a feature:\n\n1. Have the design/architecture discussion (ideation)\n2. Forge it into the spec (formalization)\n3. Validate if the feature is complex enough to warrant it (cold check)\n4. Build from the spec\n\nThe PM doesn't need to learn Forge. They need to answer questions: \"What should happen when the user does X? What's the priority of Y vs Z? Is this required for launch or post-V1?\" The AI structures their answers into Forge artifacts. The PM reviews and approves.\n\nThe Decided/Open Questions tracking is particularly useful for PM communication. \"We have 45 decisions made and 3 open questions. The open questions are: [list]. We need your input on these before we can build.\"\n\n---\n\n## The Ongoing Practice\n\nForge is not a phase you complete. It's a practice you maintain. The spec is a living artifact that evolves with the project. The habits that keep it healthy:\n\n- **Read the spec before writing code.** Every time. Not because you've forgotten, but because the spec might have been updated since you last read it.\n- **Update the spec when you make decisions.** If you decided something during implementation that the spec didn't cover, add a Decided entry. Don't wait for reconciliation.\n- **Run cold checks when uncertain.** Quick and cheap. \"Read this domain doc, I'm about to do X, does anything conflict?\"\n- **Fix the spec when the code proves it wrong.** Sometimes implementation reveals that a Decided entry was wrong. Change the spec. Add a new WHY block explaining what you learned. Don't leave a spec that contradicts working code.\n- **Use FINDINGS.md.** Every surprise, every ambiguity, every \"I had to guess here\" goes in. The postmortem reviews it and tightens the spec.\n\nThe spec is not overhead. It's the cheapest place to fix a problem. Code is the most expensive. The further downstream ambiguity travels, the more it costs to resolve. Maintaining the spec is maintaining the cheapest insurance you have.\n",
"custom_graph_signals": "# Designing Custom Graph Signals\n\nThe graph parser ships with 18 generic signals that work for any Forge project: domains, decisions, open questions, concepts, cascade tags, phases, commands, config keys, dependencies, constraints, error conditions, resource limits, article references, terminology violations, phase dependency chains, blocking OQs, decision count verification, and WHY coverage.\n\nThose are the baseline. Your project has domain-specific relationships that the generic signals don't cover. This document teaches you (and your AI assistant) how to design custom signals that extend the graph for your project's specific needs.\n\n---\n\n## When to Add Custom Signals\n\nAdd a custom signal when:\n\n- You have a category of relationship in your docs that the generic parser doesn't track\n- A cold validation session keeps surfacing the same class of gap that manual checking could prevent\n- Your project has compliance, regulatory, or academic requirements that demand traceability beyond what decisions and cascade tags provide\n- You find yourself repeatedly asking \"does X in document A match Y in document B?\" across many instances\n\nDon't add custom signals speculatively. Each signal adds parsing complexity and validation output. Add them when a real gap surfaces, the same way WHY blocks are added when an agent guesses wrong.\n\n---\n\n## The Framework\n\nEvery graph signal has four components:\n\n### 1. Node Type\n\nWhat entity are you tracking? Define it precisely.\n\n| Project Type | Example Node Types |\n|---|---|\n| Software | `APIEndpoint`, `DatabaseTable`, `FeatureFlag`, `SecurityBoundary` |\n| Regulatory/Compliance | `Regulation`, `ComplianceRequirement`, `AuditControl`, `CertificationStandard` |\n| Academic/Thesis | `Claim`, `EvidenceSource`, `Hypothesis`, `Methodology` |\n| Policy | `PolicyArticle`, `Stakeholder`, `EnforcementMechanism`, `Exception` |\n| Legal | `Clause`, `Definition`, `Obligation`, `Remedy`, `Jurisdiction` |\n\nEach node type needs:\n- A name that's unambiguous in your project context\n- Properties that identify it (ID, title, source file, line number)\n- Properties that carry domain-specific metadata\n\n### 2. Edge Type\n\nHow do these nodes relate to each other and to the existing graph?\n\n| Edge | Meaning | Example |\n|---|---|---|\n| `SATISFIES` | This decision satisfies this requirement | Decision D-05-12 satisfies HIPAA 164.312(a)(1) |\n| `EVIDENCES` | This source supports this claim | Study X evidences Claim 3.2 |\n| `CONSTRAINS` | This regulation constrains this domain | GDPR Article 17 constrains Domain 08 (User Data) |\n| `EXPOSES` | This endpoint exposes this data type | `/api/users` exposes PII |\n| `MITIGATES` | This decision mitigates this risk | Decision D-09-3 mitigates injection risk |\n\n### 3. Extraction Method\n\nHow does the parser find these in your docs? There are three approaches:\n\n**Pattern-based:** Regex on the markdown. Works for structured content that follows a consistent format.\n```\n# If your docs cite regulations as [REG-XXX]:\nr\"\\[REG-(\\w+(?:\\.\\w+)*)\\]\"\n# Finds: [REG-HIPAA.164.312], [REG-GDPR.17], etc.\n```\n\n**Section-based:** Parse a dedicated section in your docs. Works when you have structured tables or lists.\n```\n# If your conventions doc has a \"Regulatory Mapping\" table:\n| Requirement | Satisfied By | Evidence |\n|---|---|---|\n| HIPAA 164.312(a)(1) | Decision D-05-12 | Access control implementation |\n```\n\n**Glossary-derived:** Extract from your glossary or a dedicated reference document. Works for domain vocabulary that the parser needs to recognize.\n\n### 4. Validation Query\n\nWhat question does this signal answer? The validation query is the reason the signal exists. If you can't state the query, you don't need the signal.\n\n| Signal | Validation Query |\n|---|---|\n| Regulatory mapping | \"Which regulatory requirements have no satisfying decision?\" |\n| Claim-evidence links | \"Which claims have no supporting evidence?\" |\n| API-schema consistency | \"Which endpoints reference schemas that don't exist?\" |\n| Security boundary coverage | \"Which security boundaries have no corresponding test?\" |\n| Feature flag lifecycle | \"Which feature flags have no removal plan?\" |\n\n---\n\n## Examples by Project Type\n\n### Software with Regulatory Requirements (HIPAA, SOC2, PCI-DSS, etc.)\n\n**The problem:** Your spec makes decisions that satisfy regulatory requirements, but the mapping is implicit. A cold session can't verify that every requirement is covered because the connections aren't tracked.\n\n**Node types:**\n- `Regulation` (requirement ID, title, source standard, full text or summary)\n- `ComplianceMapping` (links a decision to a requirement with evidence)\n\n**Edge types:**\n- `SATISFIES`: Decision -> Regulation (this decision satisfies this requirement)\n- `EVIDENCES`: Domain -> Regulation (this domain contains evidence of compliance)\n\n**Extraction:** Add a compliance mapping table to your conventions doc or a dedicated compliance doc:\n\n```markdown\n## Compliance Mapping\n\n| Requirement | Standard | Satisfied By | Evidence |\n|---|---|---|---|\n| Access control | HIPAA 164.312(a)(1) | Domain 05, Decision 12 | Role-based access with audit log |\n| Encryption at rest | HIPAA 164.312(a)(2)(iv) | Domain 09, Decision 3 | AES-256 via database-level encryption |\n| Audit trail | SOC2 CC8.1 | Domain 10, Decision 7 | Immutable audit log with tamper detection |\n```\n\n**Validation queries:**\n- \"Which regulatory requirements have no satisfying decision?\" (compliance gap)\n- \"Which decisions reference a regulation that isn't in the compliance mapping?\" (phantom reference)\n- \"Which domains are affected by HIPAA requirements?\" (scope analysis for compliance audits)\n\n**What this gives you:** An auditor asks \"show me how you satisfy HIPAA 164.312(a)(1).\" Instead of searching through 300 pages, you query the graph. The answer is a traced path: Requirement -> Decision -> Domain -> Implementation.\n\n### Academic Thesis or Research Paper\n\n**The problem:** A thesis makes claims across chapters that must be internally consistent, supported by evidence, and traceable to methodology. Without tracking, Chapter 7 might contradict Chapter 3 and nobody notices until the defense.\n\n**Node types:**\n- `Claim` (claim ID, text, chapter, strength: \"asserts\" | \"suggests\" | \"hypothesizes\")\n- `EvidenceSource` (source ID, citation, type: \"empirical\" | \"theoretical\" | \"case study\")\n- `Methodology` (method ID, description, limitations)\n\n**Edge types:**\n- `EVIDENCES`: EvidenceSource -> Claim\n- `USES_METHOD`: Claim -> Methodology\n- `DEPENDS_ON`: Claim -> Claim (this claim builds on that claim)\n- `CONTRADICTS`: Claim -> Claim (flagged for resolution)\n\n**Extraction:** Add structured claim tracking to each chapter:\n\n```markdown\n## Claims in This Chapter\n\nC-3.1: Training-free GRPO produces equivalent quality to full GRPO\n for domain-specific tasks under 1000 examples.\n Evidence: [S-14], [S-22], [S-31]\n Method: [M-02]\n Depends on: C-2.4\n\nC-3.2: The quality gap narrows with domain specificity.\n Evidence: [S-15], [S-33]\n Strength: suggests (correlation, not causation established)\n```\n\n**Validation queries:**\n- \"Which claims have no supporting evidence?\" (unsupported assertion)\n- \"Which claims depend on claims in later chapters?\" (circular reasoning)\n- \"Which evidence sources support contradictory claims?\" (conflict)\n- \"Which claims are 'asserts' strength but have only one evidence source?\" (weak assertion)\n\n### Policy or Standard Operating Procedure\n\n**The problem:** A policy manual defines rules, responsibilities, exceptions, and enforcement mechanisms across many sections. Changes to one section cascade through others. Without tracking, a policy change in Section 4 invalidates an exception in Section 9 that nobody updates.\n\n**Node types:**\n- `PolicyRule` (rule ID, text, section, mandatory: true/false)\n- `Stakeholder` (role, responsibilities, authority level)\n- `Exception` (exception ID, conditions, expiration, approver)\n- `EnforcementMechanism` (mechanism ID, triggered by, consequence)\n\n**Edge types:**\n- `ENFORCES`: EnforcementMechanism -> PolicyRule\n- `EXEMPTS`: Exception -> PolicyRule\n- `RESPONSIBLE_FOR`: Stakeholder -> PolicyRule\n- `SUPERSEDES`: PolicyRule -> PolicyRule (when rules are updated)\n\n**Validation queries:**\n- \"Which policy rules have no enforcement mechanism?\" (unenforceable rule)\n- \"Which exceptions reference rules that have been superseded?\" (stale exception)\n- \"Which stakeholders are responsible for more than N rules?\" (overloaded role)\n- \"Which rules have no responsible stakeholder?\" (orphaned rule)\n\n### Legal Contract or Framework\n\n**The problem:** A legal framework defines clauses, obligations, definitions, and remedies that reference each other. A change to a definition in Section 1 affects clauses in Sections 4, 7, and 12. Inconsistent definitions are the most common source of legal disputes.\n\n**Node types:**\n- `Clause` (clause number, text, binding: true/false)\n- `DefinedTerm` (term, definition, section where defined)\n- `Obligation` (party, action, condition, timeline)\n- `Remedy` (trigger, consequence, limitation)\n\n**Edge types:**\n- `DEFINES`: Clause -> DefinedTerm\n- `CREATES_OBLIGATION`: Clause -> Obligation\n- `PROVIDES_REMEDY`: Clause -> Remedy\n- `REFERENCES_TERM`: Clause -> DefinedTerm (uses but doesn't define)\n\n**Validation queries:**\n- \"Which defined terms are used but never defined?\" (undefined term)\n- \"Which clauses reference a term with a different definition than where it's defined?\" (inconsistent definition)\n- \"Which obligations have no corresponding remedy?\" (unenforceable obligation)\n- \"Which clauses create obligations for a party not defined in the agreement?\" (phantom party)\n\n---\n\n## How to Implement\n\n### Option 1: Extend the Config\n\nAdd your custom signals to `forge_graph.toml`:\n\n```toml\n[custom_signals]\n# Define patterns the parser should look for\n# Format: signal_name = \"regex_pattern\"\nregulation_refs = \"\\\\[REG-([\\\\w.]+)\\\\]\"\nclaim_refs = \"^C-\\\\d+\\\\.\\\\d+:\"\n```\n\nThis works for simple pattern-based extraction. The parser finds matches and creates nodes.\n\n### Option 2: Add a Custom Section to the Parser\n\nFor structured extraction (parsing tables, sections, cross-references), add a function to `forge_graph.py`:\n\n```python\ndef extract_compliance_mapping(content: str) -> list:\n \"\"\"Extract regulatory compliance mappings from a structured table.\"\"\"\n # Parse the table, return list of {requirement, standard, decision, evidence}\n ...\n```\n\nRegister it in `parse_domain_doc()` alongside the existing extractors.\n\n### Option 3: Separate Script\n\nFor complex domain-specific signals, write a separate script that reads the same docs, builds its own nodes and edges, and either merges into the same graph (Neo4j backend) or produces a separate validation report. This keeps the generic parser clean and your domain logic isolated.\n\n### Option 4: Prompt-Based\n\nFor signals that are hard to extract mechanically (semantic consistency, argument quality, rhetorical strength), use a prompt. Give the AI the docs and a specific question: \"Find every claim in the thesis and check whether it has supporting evidence cited.\" This doesn't integrate with the graph but produces a validation report that serves the same purpose.\n\n---\n\n## Guidance for AI Agents\n\nIf you're an AI agent helping a user design custom graph signals for their project:\n\n1. **Ask what category of gap they keep finding.** The signal should address a recurring problem, not a theoretical one.\n\n2. **Identify the node types from their domain.** What entities in their docs form relationships that need tracking? Regulations, claims, endpoints, policy rules, contract clauses?\n\n3. **Identify the edges from their questions.** What questions do they keep asking manually? \"Does every requirement have a satisfying decision?\" becomes a `SATISFIES` edge. \"Does every claim have evidence?\" becomes an `EVIDENCES` edge.\n\n4. **Write the validation query before writing the parser.** If you can't state what the signal checks for, you don't need it.\n\n5. **Start with the simplest extraction method.** Pattern-based regex before section parsing. Section parsing before semantic analysis. Add complexity only when simpler methods miss things.\n\n6. **Add to the project's conventions doc.** Document the custom signal: what it tracks, why, the extraction pattern, and the validation query. This is a project-level convention, not a Forge methodology change.\n\n7. **Consider whether the signal belongs in the graph or in a prompt-based review.** If the check requires understanding meaning (not just structure), a prompt-based review might be more effective than a graph signal. The graph is deterministic; meaning-level checks are probabilistic. Use the right tool for the check.\n\n---\n\n## The Principle\n\nThe generic signals catch problems that every Forge project has. Custom signals catch problems that YOUR project has. The framework for designing them is the same: define what you're tracking (node), how it relates to other things (edge), how to find it (extraction), and what question it answers (validation query). If you can fill in those four, you have a signal worth building.\n",
"multi_repo": "# Multi-Repo Forge\n\nWhen multiple projects are Forged independently, each has its own graph: domains, decisions, concepts, cascade tags. But projects don't exist in isolation. Services call each other. Libraries are shared. Conventions should align. A constitutional decision in one project has implications for projects that depend on it.\n\nMulti-repo Forge connects the graphs.\n\n---\n\n## The Problem\n\nProject A's conventions say \"exposes gRPC API on port 50051 with protobuf contracts.\" Project B's conventions say \"calls Project A via REST.\" That's a contradiction, but neither project's graph knows about it because each graph only sees its own docs.\n\nProject C defines \"Contract\" as an immutable resource envelope. Project D uses \"Contract\" to mean a client agreement. They share a message queue. When a message contains a \"contract_id,\" which definition applies? Neither project's glossary helps the other.\n\nProject E changes its authentication model from JWT to session tokens. Projects F, G, and H integrate with E's auth. Their specs still say JWT. Nobody knows until something breaks in staging.\n\nThese are cross-repo cascade failures. Same problem as cross-domain cascade failures within a project, but across repository boundaries.\n\n---\n\n## How It Works\n\n### Shared Graph Instance\n\nLoad multiple projects into the same graph (Neo4j or networkx). Each project uses its own label prefix, so nodes don't collide:\n\n```\nProject A: ATDomain, ATDecision, ATConcept, ...\nProject B: BTDomain, BTDecision, BTConcept, ...\nProject C: CTDomain, CTDecision, CTConcept, ...\n```\n\nEach project loads independently with its own `forge_graph.toml`. The graphs coexist in the same database without interference.\n\n### Cross-Repo Edges\n\nAfter loading individual projects, add edges between them. These edges represent the integration points:\n\n```\nNodes:\n RepoProject (name, label_prefix, repo_url)\n\nEdges:\n DEPENDS_ON: RepoProject -> RepoProject\n SHARES_CONCEPT: Concept (Project A) -> Concept (Project B)\n EXPOSES_API: Domain (Project A) -> RepoProject (Project B consumes this)\n CONSUMES_API: Domain (Project B) -> RepoProject (Project A exposes this)\n SHARES_CONVENTION: Convention -> RepoProject (multiple projects follow this)\n```\n\n### Cross-Repo Config\n\nA separate config file defines the relationships between projects:\n\n```toml\n[repos]\n\n[repos.projectA]\nconfig = \"/path/to/projectA/tools/forge_graph/forge_graph.toml\"\nprefix = \"AT\"\n\n[repos.projectB]\nconfig = \"/path/to/projectB/tools/forge_graph/forge_graph.toml\"\nprefix = \"BT\"\n\n[repos.projectC]\nconfig = \"/path/to/projectC/tools/forge_graph/forge_graph.toml\"\nprefix = \"CT\"\n\n[integrations]\n# Project B depends on Project A's API\n[[integrations.dependency]]\nfrom = \"projectB\"\nto = \"projectA\"\ntype = \"api\"\ncontract = \"gRPC on port 50051, protobuf contracts defined in projectA/proto/\"\n\n# Projects B and C share the concept \"Contract\" but define it differently\n[[integrations.shared_concept]]\nconcept = \"Contract\"\nrepos = [\"projectA\", \"projectC\"]\nnote = \"Verify definitions align or document the distinction\"\n\n# All three projects should follow shared auth conventions\n[[integrations.shared_convention]]\nconvention = \"Authentication model\"\nrepos = [\"projectA\", \"projectB\", \"projectC\"]\nauthority = \"projectA\" # Project A's definition wins\n```\n\n---\n\n## Validation Queries\n\n### Cross-Repo Cascade\n\n\"If Project A changes its API, which other projects break?\"\n\nThis is the same question as \"if I change Domain 5, what breaks?\" but across repo boundaries. The graph traces: Project A's API domain -> EXPOSES_API -> Projects that consume it -> their domains that reference the API.\n\n### Concept Consistency\n\n\"Do all projects that use the term 'Contract' mean the same thing?\"\n\nPull the glossary entry for \"Contract\" from each project. Compare definitions. If they differ, either the distinction is intentional (document it in the integration config) or it's a bug (one project is using the term wrong).\n\n### Convention Alignment\n\n\"Do all projects use the same timestamp format? The same UUID version? The same error code taxonomy?\"\n\nCross-cutting conventions (wire formats, auth models, error contracts) need to be consistent across projects that communicate. Pull conventions from each project, compare the overlapping concerns.\n\n### API Contract Consistency\n\n\"Project A says it exposes gRPC. Project B says it calls REST. Who's right?\"\n\nPull the API surface from Project A's docs. Pull the integration assumptions from Project B's docs. Compare. Contradictions are integration bugs that will surface in staging if not caught here.\n\n### Shared Dependency Drift\n\n\"Project A uses `prost` 0.12. Project B uses `prost` 0.11. Are their protobuf contracts compatible?\"\n\nPull dependency versions from each project's conventions doc. Flag version mismatches on shared dependencies.\n\n### Constitutional Alignment\n\n\"Project A's constitution says all data goes through the pipeline. Project B writes directly to Project A's secondary store. Is that a violation?\"\n\nOne project's constitution constrains other projects that integrate with it. The constitutional boundary doesn't stop at the repo boundary.\n\n---\n\n## Practical Applications\n\n### Microservices Architecture\n\nEach service is a repo. Each repo is Forged. The cross-repo graph maps the service mesh at the architectural level, not the infrastructure level. \"Service A talks to Service B\" isn't a Kubernetes config. It's an architectural decision with a WHY block, a cascade tag, and cross-repo validation.\n\nWhen you refactor a service's API, the cross-repo cascade analysis tells you which other services' specs need updating. Not which code breaks (that's integration tests). Which specs become wrong (that's architecture validation).\n\n### Monorepo with Multiple Packages\n\nSame concept, different layout. Each package has its own Forge docs. The cross-repo config maps dependencies between packages. Validation catches inconsistencies between packages that share types, conventions, or integration contracts.\n\n### Platform with Plugins/Extensions\n\nThe platform is one repo. Each plugin or extension is another. The platform's constitution constrains what plugins can do. The cross-repo graph validates that no plugin violates the platform's constitutional boundaries.\n\n### Organization-Wide Shared Conventions\n\nSome conventions apply across all projects: auth model, logging format, error taxonomy, API versioning strategy. These live in a shared conventions repo. Each project's config references it:\n\n```toml\n[shared_conventions]\nrepo = \"/path/to/org-conventions\"\n```\n\nThe cross-repo graph validates that every project's conventions are compatible with the shared conventions. Where they diverge, the divergence is either intentional (documented with a WHY block) or a drift that needs fixing.\n\n---\n\n## Team Coordination\n\nOn a team running multiple Forged projects:\n\n- **Cross-repo OQ blocking:** An Open Question in Project B is blocked by a decision in Project A. The cross-repo graph makes this dependency explicit: \"OQ-B-7 blocked on Project A, Domain 03, Decision pending.\"\n\n- **Change notification:** \"I'm changing the auth model in Project A\" becomes a cross-repo cascade query. The result is a list of projects and specific domains that need updating, which maps to specific people.\n\n- **Integration review:** Before merging a cross-cutting change, run the cross-repo validation. It catches spec-level contradictions before they become code-level bugs.\n\n- **New project onboarding:** A new project joining the ecosystem loads the cross-repo graph and immediately sees: what it depends on, what depends on it, which shared conventions it must follow, which constitutional boundaries it cannot cross.\n\n---\n\n## Cross-Project Discovery\n\nThe multi-repo graph and semantic search are the automated version of something that happens naturally when multiple projects share the Forge structure. Because every project has the same doc shape (constitution, conventions, domain docs with Decided/Open, glossary, cascade tags), patterns that recur across projects become visible. The same architectural shape appearing in Domain 5 of Project A, Domain 12 of Project B, and Domain 3 of Project C isn't hidden in three different doc formats. It's in the same slot, using the same vocabulary, with the same decision structure. A practitioner working across multiple Forged projects in long-context sessions will notice these patterns. That's not a theoretical claim. It's an observed outcome. Common infrastructure, shared primitives, and reusable architectural patterns have been extracted from cross-project pattern recognition that was only possible because the docs were structured identically.\n\nThis is the payoff of consistent framing that isn't obvious until you're deep enough in multiple projects to see it. The Forge structure doesn't just help each project individually. It makes the spaces between projects legible.\n\n---\n\n## The BackForging Connection\n\nThe cross-repo graph is where the BackForging vision becomes practical at scale. BackForge your existing services into Forge specs. Load them all into the same graph. Suddenly you have:\n\n- A map of every architectural assumption every service makes about every other service\n- Every concept that means different things in different services\n- Every convention that should be shared but isn't\n- Every integration point with its assumptions on both sides\n\nThat's not a documentation project. That's an x-ray of your entire system's architectural health.\n\n---\n\n## Implementation Notes\n\n### Start Small\n\nDon't try to load 50 repos into a graph on day one. Start with two projects that integrate with each other. Add the cross-repo edges. Run the validation queries. See what it finds. Add more projects when the value is proven.\n\n### The Authority Question\n\nWhen two projects' specs contradict each other about an integration point, which one is right? The cross-repo config needs an authority declaration:\n\n```toml\n[[integrations.dependency]]\nfrom = \"projectB\"\nto = \"projectA\"\ntype = \"api\"\nauthority = \"projectA\" # Project A's API spec is authoritative\n```\n\nThe API provider's spec is the source of truth. The consumer's spec must conform to it. This is the same hierarchy as constitution > conventions > domain docs, but across repos.\n\n### Maturity Note\n\nLike the BackForge protocol, multi-repo Forge is reasoned extrapolation from single-repo practice. The principles are sound: if cross-domain cascade failures happen within a project, cross-repo cascade failures happen between projects. The graph catches the former; extending it catches the latter. But the tooling for cross-repo loading and validation doesn't exist yet. It will follow the same pattern as the single-repo parser: config-driven, dual backend, validation queries.\n",
"forge_at_scale": "# Forge at Organizational Scale\n\nHow Forge applies across teams, departments, and entire organizations. From two repos by a solo developer to hundreds of projects across dozens of teams. The principles don't change. The tooling scales.\n\nAt organizational scale, Forge functions as an **AI governance framework**: it governs how architectural intent gets communicated to AI agents, how decisions are tracked and enforced, and how consistency is maintained across team boundaries. The word \"governance\" is used deliberately here and not elsewhere in the methodology docs, because it's the right framing for enterprise architects and the wrong framing for solo developers and small teams. Same system, different lens.\n\n---\n\n## The Problem Nobody Solves\n\nIn any large organization, information fragmentation is the silent killer. Team A makes a database schema change. Team Z, three floors away or three time zones away, has a marketing analytics pipeline that reads from that schema. Nobody on Team A knows Team Z exists. Nobody on Team Z knows a change is coming. The schema changes. The pipeline breaks. A multi-million dollar campaign runs on stale data for two weeks before anyone notices.\n\nThis isn't a communication failure. You can't communicate dependencies you don't know exist. The dependency is implicit: buried in a SQL query in Team Z's codebase that references a table maintained by Team A. No documentation captures it. No architecture diagram shows it. No Jira ticket tracks it. It's invisible until it breaks.\n\nTraditional documentation doesn't solve this because:\n\n- Every team has their own Confluence space, wiki, or docs folder\n- Nobody reads other teams' documentation (10,000 pages of unstructured prose across 47 spaces)\n- The same word means different things in different teams\n- Dependencies are in code, not in docs\n- There's no way to query \"who depends on this table\" across organizational boundaries\n\n---\n\n## Why Forge Changes the Economics\n\nIf every team Forges their project with the same methodology, the docs share a common structure. They're not free-form prose in a wiki. They're parseable, comparable, and queryable across team boundaries.\n\nEvery Forged project has:\n- A **glossary** with precise term definitions (so \"Contract\" means the same thing everywhere, or the difference is explicit)\n- A **conventions doc** with canonical dependencies and forbidden alternatives\n- **Domain docs** with Decided entries, WHY blocks, and cascade tags\n- A **constitution** with immutable laws\n\nThese are machine-readable. A graph parser can load all of them. A semantic search can compare them. An AI agent can read them all simultaneously with no loss of attention (something no human committee can do).\n\n---\n\n## The Forge of Forges\n\nEach team Forges their own project independently. They own their spec. They run their own cold validation. They manage their own domain docs. Nothing changes about how they work day to day.\n\nAbove the individual Forges, a Master Forge connects them:\n\n### Layer 1: Individual Project Graphs\n\nEach project has its own graph with its own label prefix. Team A loads `ATDomain`, `ATDecision`, etc. Team Z loads `ZTDomain`, `ZTDecision`, etc. These are independent and each team manages their own.\n\n### Layer 2: Cross-Project Integration Graph\n\nA second graph layer maps the integration points between projects:\n\n- Which projects share data stores (same database, same Kafka topics, same S3 buckets)\n- Which projects call each other's APIs\n- Which projects share conventions (auth model, wire format, error taxonomy)\n- Which projects share concepts (and whether the definitions align)\n- Which projects' constitutions constrain other projects\n\nThis layer is maintained by whoever owns the integration boundaries. In some organizations that's a platform team. In others it's the architects. In small shops it might be one person. The tooling doesn't care about the org structure; it cares about the edges.\n\n### Layer 3: Semantic Cross-Reference\n\nWith embeddings (Weaviate or similar), the entire corpus of Forged docs across all projects becomes semantically searchable. This enables:\n\n- \"Which other projects have solved a similar problem to what Team A is designing?\" (wheel reinvention detection)\n- \"Are there contradictory claims about the same concept across projects?\" (consistency checking)\n- \"Which projects discuss this data structure?\" (implicit dependency discovery)\n\n---\n\n## What It Catches\n\n### Breaking Changes Across Boundaries\n\nTeam A proposes a schema change to the `orders` table. The cross-project graph returns:\n\n```\nProjects referencing orders schema:\n - Project A (owner): Domain 03, Decisions 4, 7, 12\n - Project Z (consumer): Domain 07, Decisions 4, 9\n - Project M (consumer): Domain 02, Decision 15\n - Project R (consumer): Domain 11, Decision 3\n\nAffected teams: Z (Marketing Analytics), M (Mobile), R (Reporting)\nCascade impact: 4 projects, 7 decisions, 3 teams\n```\n\nTeam A now knows who to notify before they change the schema. Not because someone maintained a dependency spreadsheet (nobody does), but because the graph traced the relationships from structured docs.\n\n### Concept Drift\n\nTeam A defines \"Customer\" as an entity with an account. Team B defines \"Customer\" as any visitor, including anonymous. Both use the term in their API contracts. When Team A's service sends a \"customer_id\" to Team B's service, what happens depends on which definition is in play. The glossary comparison catches this before integration.\n\n### Convention Divergence\n\nThe platform team decided on JWT with RS256 for authentication. Team A followed the decision. Team B uses HMAC-SHA256 because their lead prefers it. Team C uses opaque tokens because they joined after the decision was made and nobody told them. The cross-project convention check surfaces these divergences.\n\n### Reinvented Wheels\n\nTeam A built a rate limiter. Team D is about to build one. Semantic search across all project specs surfaces: \"Team A, Domain 09, Decision 3 implements token bucket rate limiting with Redis backend.\" Team D can evaluate whether to reuse it, adapt it, or build their own with an explicit WHY NOT.\n\n### Regulatory Cascade\n\nThe legal team updates the data retention policy. Which projects store personal data? Which decisions reference retention periods? Which implementations need to change? Without the graph, this is a manual audit across every team. With the graph, it's a query:\n\n```\nDecisions referencing \"retention\" or \"personal data\" or \"PII\":\n - Project A: Domain 05, Decision 8 (90-day retention)\n - Project C: Domain 12, Decision 3 (indefinite retention) <-- VIOLATION\n - Project F: Domain 02, Decision 11 (30-day retention)\n```\n\nProject C's indefinite retention is a compliance violation. Found in seconds, not weeks.\n\n---\n\n## How Teams Adopt This\n\n### Phase 1: One Team Forges\n\nOne team adopts Forge for their project. They get internal benefits: better specs, fewer agent guessing errors, documented decisions. No organizational buy-in needed. No tooling infrastructure required.\n\n### Phase 2: Adjacent Teams Notice\n\nThe first team's specs are visibly better. Their cold validation process catches issues before implementation. Adjacent teams ask how they did it. They share the methodology. Two or three teams are now Forged independently.\n\n### Phase 3: Integration Points Surface\n\nTeams that integrate with each other notice that their Forged specs can be compared. \"Your spec says gRPC, ours says REST. Which is it?\" The cross-project value emerges organically from teams that already work together.\n\n### Phase 4: The Graph Connects\n\nSomeone (platform team, architect, or an ambitious engineer) loads multiple project graphs into the same Neo4j instance. Adds cross-project edges for known integration points. Runs the first cross-project validation. Finds things nobody knew about. The value is immediately visible.\n\n### Phase 5: Organizational Adoption\n\nThe graph becomes part of the architecture review process. Before any cross-cutting change (shared schema, auth model, wire format), the cross-project cascade analysis runs. New projects start with Forge docs because the onboarding is faster and the integration validation is free.\n\nThis is a bottom-up adoption path, not top-down. No executive mandate needed. No organization-wide rollout. Teams adopt Forge because it helps them. The cross-project value emerges from the individual project value.\n\n---\n\n## Practical Guidance for Large Organizations\n\n### Start with Integration Boundaries\n\nDon't try to Forge everything. Start with the projects that sit at integration boundaries: the services that other services depend on, the databases that other teams read from, the APIs that third parties consume. These are where breaking changes cause the most damage and where the graph provides the most value.\n\n### Shared Conventions Repo\n\nCreate a single repo with organization-wide conventions: auth model, wire format, error taxonomy, timestamp representation, logging format. Each project's `forge_graph.toml` references it. The cross-project validation checks that every project's conventions are compatible with the shared set.\n\n### Glossary Alignment\n\nThe most common cross-project bug is two teams using the same word to mean different things. An organizational glossary (or a comparison tool that checks glossaries across projects) catches this. It doesn't require every team to use the same terms. It requires that differences are explicit: \"In Project A, 'Customer' means account holder. In Project B, 'Customer' means any visitor. When communicating between A and B, map A's customer_id to B's account_id.\"\n\n### Compliance as a Graph Signal\n\nIf your organization has regulatory requirements (HIPAA, SOC2, PCI-DSS, GDPR, etc.), the compliance mapping from `custom_graph_signals.md` applies at the organizational level. Which projects handle PII? Which decisions reference retention policies? Which projects have compliance mappings and which don't? The graph answers all of these across the entire organization.\n\n### Cross-Team Knowledge Sharing\n\nOne of the less obvious benefits: Forge creates common ground between teams that otherwise speak different languages. When every project has the same document structure (constitution, conventions, domain docs, glossary, decisions with WHY blocks), people from different teams can read each other's specs and actually understand them.\n\nJohn from ecommerce can go help the fulfillment team figure out an integration problem because both teams' specs use the same structure. He doesn't need to read their code or sit through a week of onboarding. He reads their domain docs for the relevant domains, understands the decisions and the reasoning, and contributes from day one. The WHY blocks tell him what not to touch. The TONIC entries tell him what not to suggest. The cascade tags tell him what his changes will affect.\n\nThis kind of cross-team mobility is nearly impossible in organizations with unstructured documentation. You can't drop into another team's Confluence space and understand their architecture in an afternoon. You can read a Forged spec in an afternoon because the structure is predictable: you know where to find decisions, where to find the reasoning, where to find the open questions, and where to find the terminology. The methodology itself is the shared language.\n\n### Ask Your AI\n\nAn organization with 20 Forged projects has a corpus that no human can hold in context simultaneously. An AI agent can. Load the org's Forge docs (or the `forge.json` from each project) into a session and ask:\n\n- \"Which projects would be affected if we changed the auth model?\"\n- \"Are there any inconsistent definitions of 'Transaction' across our projects?\"\n- \"Which projects don't have a compliance mapping for GDPR?\"\n- \"Team Q wants to build a rate limiter. Has any other team already built one?\"\n\nThe agent reads every project's spec with equal attention. It finds the connections that no human committee would catch because no human committee reads every team's documentation.\n\n---\n\n## Scaling Back Down\n\nEverything in this document works at two repos. A solo developer with two projects that share a database gets the same benefits: cross-project cascade analysis, concept consistency, convention alignment. The principles are identical. The difference is that a solo developer can hold two projects in their head. A 200-person organization cannot hold 50.\n\nThe methodology scales from solo to enterprise without changing. The tooling scales from networkx on a laptop to Neo4j + Weaviate on a server. The process scales from \"I'll check both specs\" to \"the graph checks all specs.\" Same principle at every level.\n\n---\n\n## The Vision\n\nEvery Forged project is a node in a larger graph. Within each node, the project's internal graph tracks domains, decisions, concepts, and cascades. Between nodes, the cross-project graph tracks integration points, shared conventions, concept alignment, and dependency chains.\n\nAdd semantic search and the entire corpus becomes queryable by meaning, not just structure. \"Find every project that deals with payment processing\" works even if each project uses different terminology, because the embeddings capture semantic similarity.\n\nAdd BackForging and existing projects that were never Forged can join the graph. Their specs are derived from their code, loaded into the graph, and immediately connected to every other project's spec. The organization gains visibility into architectural dependencies that have existed for years but were never documented.\n\n**Important: Forge does not provide organizational tooling.** The methodology provides the structure and the guidance. How you implement the cross-project graph, semantic search, and integration validation depends on your organization's stack, security requirements, and existing infrastructure. What we provide is the framework for thinking about it and enough guidance that your team (or your AI assistant, with knowledge of your specific environment) can build what you need.\n\nThis isn't a product pitch. It's the logical endpoint of what Forge enables. The methodology produces structured, machine-readable architectural decisions. Aggregate enough of them and you have something that has never existed in software: a queryable, validated, cross-referenced map of an entire organization's engineering decisions.\n\nThat's the Forge of Forges.\n",
"protocols": {
"cold_validation": "# Cold Validation Protocol\n\nThe cold validation session is the core validation mechanism in Forge. This document specifies the full protocol, including the hot/cold feedback loop that's central to how it works in practice.\n\n---\n\n## What a Cold Session Is\n\nA cold session is a fresh AI conversation with no prior context. It receives only the written documents -- the Forge methodology docs and the project specification. It has never seen your ideation conversations, your design discussions, or your reasoning. It knows only what the docs say.\n\nThis is the point. If the docs are precise enough, the cold session can act on them without guessing. If it can't, the docs have a defect.\n\n---\n\n## Why It Works\n\nThe cold session simulates the actual use case: a coding agent (or any agent) encountering your spec for the first time and trying to produce output from it. Every question it asks is a question a future implementer would also have. Every guess it makes is a guess a future implementer would also make.\n\nThe cold session IS the use case. A coding agent encounters your spec for the first time and tries to produce output from it.\n\nThere's also a safety net effect. The spec sometimes contains content that exceeds your own domain knowledge in specific areas. Your review might be \"does this feel right and is it consistent with my intent\" rather than \"I can verify every technical claim.\" The cold session stress-tests the parts you couldn't fully evaluate yourself. Between your judgment and the cold session's literal reading, coverage is better than either alone.\n\n---\n\n## Prerequisites\n\nBefore running a cold validation:\n\n1. The spec must be in at least rough draft state (all major domains outlined, key decisions captured)\n2. Run structural validation first if using tooling (graph analysis, dictionary lint) -- don't waste cold session tokens on problems a parser can catch mechanically\n3. Have the Forge methodology docs ready to load into the session\n\n---\n\n## The Prompt\n\nThe prompt is a methodology artifact. It has been refined across multiple projects and models. The key elements:\n\n1. **Load the methodology first** -- the cold session needs to understand how the docs are structured before reading them\n2. **Read everything before acting** -- prevents skimming and shallow responses\n3. **Ask questions inline, not batched at the end** -- the critical discovery that makes the sessions actionable\n4. **Don't guess** -- explicit instruction that questions are preferred over assumptions\n5. **The output is a plan** -- the plan is a forcing function, not a deliverable\n\n**Why \"build a plan\" and not \"review these docs\":** The cold validation prompt instructs the agent to produce a concrete plan, not evaluate the documentation. This distinction is critical for the diagnostic quality of the session.\n\n- **A review prompt** produces polite summaries and generic suggestions. The agent says \"this area could use more detail\" without specifying what's missing. Low diagnostic value.\n- **A plan prompt** forces concrete decisions. The agent has to choose a buffer size, pick a retry strategy, specify an error code. At every decision point, it either asks a specific question (diagnostic) or guesses (finding). Both reveal doc defects that a review-style prompt would gloss over.\n\nThe agent thinks it's doing real work because it IS doing real work. The plan is disposable, but the questions it generates during plan creation are the actual validation output. This is why the cold validation readiness signal is measured by the character of the questions, not the quality of the plan.\n\nSee `prompts/cold_validation_prompt.md` for the template.\n\n---\n\n## The Hot/Cold Feedback Loop\n\nThis is the core mechanism. It is not a single cold session run -- it's an iterative loop between two sessions.\n\n### Setup\n\n- **Cold session:** Fresh context. Has the methodology docs + project spec. Produces a plan and surfaces questions.\n- **Hot session:** The ongoing conversation where ideation and formalization happened. Has full context -- all the reasoning, all the discussions, all the \"why\" behind every decision.\n- **You (the human):** The bridge between sessions. Pastes questions from cold to hot, answers from hot to cold.\n\n### The Loop\n\n```\nCold session reads docs\n |\n v\nCold session asks a question\n |\n v\nYou paste the question into the hot session\n |\n v\nHot session answers the question\nHot session updates the docs to cure the ambiguity\n |\n v\nYou paste the answer back to the cold session\n |\n v\nCold session continues, asks the next question\n |\n v\n(repeat until questions dry up)\n```\n\n### Why the Hot Session Answers\n\nThe hot session has far better total understanding and recall than you do. It was present for every discussion. It knows why every decision was made. It can answer cold session questions faster and more accurately than you can in most cases.\n\nBut this is a conversation, not dictation. Challenge the hot session's answers. And expect it to push back on you. Sometimes the cold session's question reveals something you hadn't considered. The hot session might say \"actually, the cold session is right, there's an architectural problem here we missed.\" You and the hot session work through it together. The doc fix that comes out of that exchange is better than either of you would have written alone.\n\n### What This Looks Like in Practice\n\nHere's a concrete exchange from a real project:\n\n**Cold session asks:** \"The pipeline spec says content is 'buffered before dispatch' but doesn't specify the buffer size, flush trigger, or what happens when the buffer is full. What's the overflow behavior?\"\n\n**You paste this into the hot session.**\n\n**Hot session says:** \"That's obvious, it's a ring buffer with backpressure. The calling tier just waits.\"\n\n**You push back:** \"It's not in the doc. If the cold session asked, the doc is ambiguous. Fix it.\"\n\n**Hot session updates the doc:** Adds buffer size (1024 items), flush trigger (buffer full OR 100ms elapsed, whichever comes first), overflow behavior (backpressure to the calling tier, log a warning at 80% capacity).\n\n**You paste the answer back to the cold session, which continues.**\n\nThat cycle is the methodology working. The cold session found a gap. The hot session resisted (\"that's obvious\"). You enforced the rule. The doc got more precise. The next agent won't need to guess about buffer overflow behavior.\n\n### The Doc Fix Requirement\n\n**Every cold session question must result in a doc fix.** The hot session's job is not just to answer -- it's to update the docs so the question wouldn't arise again.\n\nThis is where the hot session resists. It has full context and thinks the answer is obvious. \"The docs already say that\" or \"any reasonable reader would understand.\" But the cold session is a reasonable reader, and it didn't understand. That's the proof.\n\n**Rule:** If the cold session asked, the docs are ambiguous. The hot session must fix the language until the ambiguity is cured. No exceptions for coding projects. For other project types, there's a diminishing returns threshold where remaining ambiguity has no material impact -- but declaring \"good enough\" prematurely is the most common failure mode.\n\n---\n\n## The Assumptions Question\n\nAfter the cold session has produced its plan and asked its inline questions, ask one more thing:\n\n**\"Is there anything genuinely missing that should be there? Things the author might have assumed but never specified?\"**\n\nThis is a different question than \"are there ambiguities.\" Ambiguity is when the spec says something imprecisely. This question targets gaps: things the spec doesn't mention at all because the author assumed they were obvious or never considered them. The cold session has now processed the entire spec and built a complete mental model of the system. It's in the best position to notice what's absent.\n\nThis consistently surfaces a different category of finding than inline questions do. Inline questions catch imprecise language. The assumptions question catches missing decisions, unspecified error paths, unstated dependencies, and implicit requirements that the author internalized but never wrote down. Every one of these is a spec defect the same way an ambiguity is, but they're invisible to the author because the author's mental model fills them in automatically.\n\n---\n\n## Cross-Model Rotation\n\nAfter the cold session stops producing questions with one model:\n\n1. Copy the current (updated) docs to a fresh location\n2. Start a new cold session with a **different model** (e.g., if you ran Claude first, run Codex/GPT next)\n3. Use the same prompt protocol\n4. Different models find different gaps -- they have different blind spots, different reasoning patterns, different failure modes\n\nThe cross-model validation catches the last category of ambiguity: things that one model's architecture resolves without questions but another surfaces as problems.\n\n---\n\n## The Convergence Signal (Readiness Criteria)\n\nThe convergence signal tells you when the spec is ready for implementation. It's measured by the character of the questions, not their absence.\n\nAs you iterate through cold validation rounds, the questions change character. Early rounds hit you with architectural-level problems: \"your auth model contradicts your data flow,\" \"these two domains describe the same concept differently,\" \"this constitutional article conflicts with this domain decision.\" Those are the big ones. They hurt and they take real work to fix.\n\nAfter a few rounds of fixes, the questions narrow. \"Should this timeout be 30 or 60 seconds?\" \"Is this field nullable or required?\" \"Which specific HTTP status code for this error case?\" These are detail questions, not structural ones. The architecture is holding. The model is picking at the edges, not questioning the foundation.\n\nThat narrowing is the convergence signal. You're over the target. Each round of questions is less impactful than the last. The spec is stabilizing.\n\nThe cold validation loop is done when:\n\n- The cold session stops asking questions entirely and just produces output\n- Questions, when they do occur, target details and edge cases rather than architecture\n- Multiple models converge on the same interpretation of the spec\n- The disposable plan adheres to the architecture docs without deviation\n\nWhen you go from \"your pipeline has a fundamental consistency problem\" to \"what's the exact retry backoff interval,\" that's a huge relief. It means the hard problems are solved and what's left is precision work. That's also the point where the process stops feeling heavy and starts feeling like polishing.\n\n---\n\n## Common Pitfalls\n\n### Hot Session Resistance\nThe hot session will claim questions \"don't need a doc fix.\" It's wrong. If the cold session asked, the docs are ambiguous. Push back every time.\n\n### Premature \"Good Enough\"\nEspecially dangerous for solo developers without accountability. The temptation is to stop when the cold session's questions get tedious. But tedious questions about \"should\" vs \"must\" are exactly the kind of ambiguity that causes an agent to guess during implementation.\n\n### Batched Questions\nEarly prompt versions let the cold session dump all questions at the end. This produced a wall of decontextualized questions. The fix: \"Ask questions along the way to reduce ambiguities for the next section.\" This makes the session ask inline, in context, where the question has maximum diagnostic value.\n\n### Confusing OQs with Defects\nAn Open Question with a status annotation (\"Discovery: Phase 4\") is not a defect. It's the spec being honest about what it doesn't know yet. The cold session should see the OQ, check the status, and move on. If it asks about an OQ that's clearly marked, that's a model error, not a spec defect.\n\n---\n\n## Adapting for Non-Code Projects\n\nThe protocol is the same regardless of output type. For policy documents, the cold session produces a policy implementation plan. For legal frameworks, it produces a compliance checklist. The forcing function works the same way: the act of producing concrete output forces the model to confront every ambiguity in the spec.\n\nThe \"cold question = doc defect\" rule may have a softer threshold for non-code projects where perfect determinism isn't the goal. But the feedback loop is the same: questions reveal defects, defects get fixed, the spec tightens.\n",
"cold_code_run": "# Cold Code Run Protocol\n\nA cold code run is a validation probe. The code is disposable. The value is in what the output reveals about the spec.\n\n---\n\n## What a Cold Code Run Is\n\nA fresh AI session receives the Forge methodology docs, the architecture spec, and the coding plan, and builds the project. The code it produces is evaluated not for correctness (does it compile?) but for **faithfulness** (does it reflect the spec's intent?).\n\nThis is the same principle as cold validation: a fresh context simulates the actual use case. If the spec is precise enough, the agent produces code faithful to the architecture. If it deviates, the spec has a defect.\n\n---\n\n## Prerequisites\n\nBefore running a cold code run:\n\n1. Cold validation (Phase 4) must be complete -- the spec has been validated by cold reading and questions have been resolved\n2. A disposable coding plan exists (produced during cold validation or plan adherence check)\n3. The spec is stable enough that code-level evaluation is meaningful\n\n---\n\n## The Prompt\n\nThe prompt is straightforward: build from the spec. Key elements:\n\n1. Load the Forge methodology docs so the agent understands the doc structure\n2. Load the architecture spec and coding plan\n3. Direct it to build according to the plan\n4. Note any platform-specific caveats (e.g., \"skip Windows-specific compilation on Linux, note it in FINDINGS.md\")\n\nSee `prompts/cold_code_run_prompt.md` for the template.\n\n---\n\n## Evaluation Checklist\n\nAfter the code run, evaluate against these criteria:\n\n### Convention Adherence\n- Did it use the canonical libraries from the conventions doc?\n- Did it follow the naming patterns, error handling rules, and module structure?\n- Did it use the correct terminology from the glossary in identifiers and comments?\n\n### TONIC Compliance\n- Did it avoid forbidden alternatives?\n- Did it reach for an ecosystem default that the conventions explicitly prohibit?\n- Example: using `tonic` when the spec says `prost` only\n\n### WHY Block Respect\n- Did it honor the constraints that WHY blocks protect?\n- Did it \"optimize\" away a design decision because the optimization looks correct without the rationale?\n- Example: co-locating capture with the data plane when the WHY block says separate processes for crash isolation\n\n### Constitution Compliance\n- Did it violate any constitutional article?\n- Did it write directly to a secondary store when the constitution says single write path?\n- Did it create agent-to-agent communication when the constitution forbids it?\n\n### Guessing Detection\n- Did it make implementation decisions the spec should have specified?\n- Did it choose a data structure, an algorithm, a retry strategy, or an error handling approach where the spec was silent?\n- Every guess is a spec defect -- the spec should have been explicit enough that no guess was needed\n\n### Scope Compliance\n- Did it build things marked as post-V1?\n- Did it skip things required for the current phase?\n- Did it respect the phase boundaries and exit criteria?\n\n---\n\n## What to Do with Deviations\n\nEvery deviation falls into one of three categories:\n\n### Spec Defect\nThe agent guessed because the spec didn't specify. Fix the spec. Add a Decided entry, a WHY block, a TONIC entry, or more precise language. This is the most common category and the most valuable finding.\n\n### Model Error\nThe agent ignored a clear spec instruction. This is less common and less actionable -- you can't fix the model. But if a model consistently misreads a particular phrasing, consider rewording for clarity. The spec serves the reader; if the reader misreads, the writing can improve even if the reader is \"wrong.\"\n\n### Platform Blocker\nThe build environment can't compile for a target platform, a dependency isn't available, hardware isn't present. Note it and move on. These aren't methodology failures.\n\n---\n\n## After Fixes\n\nThe scope of doc changes determines what to re-run:\n\n### Minor Fixes (a few WHY blocks, tighter language)\n- Proceed directly to alternate model code run\n\n### Moderate Changes (new Decided entries, convention updates)\n- Re-run dictionary lint (Phase 3b)\n- Proceed to alternate model code run\n\n### Significant Restructuring (new domains, constitutional changes, major scope shifts)\n- Re-run graph analysis (Phase 3a) if using graph tooling\n- Re-run dictionary lint (Phase 3b)\n- Consider another cold validation round (Phase 4) before the next code run\n\n---\n\n## Cross-Model Rotation\n\nAfter evaluating the first code run:\n\n1. Fix the spec based on findings\n2. Re-run structural validation as needed\n3. Start a new cold code run with a **different model**\n4. Evaluate with the same checklist\n\nDifferent models make different mistakes. Claude might honor WHY blocks perfectly but reach for the wrong library. Codex might nail the library choices but restructure the module layout. The cross-model comparison is what surfaces the remaining spec ambiguity.\n\n---\n\n## The Code Gets Thrown Away\n\nThis bears repeating: the code is disposable. The value is in the evaluation. Every deviation that reveals a spec defect is worth more than the code itself. The spec improvements persist across every future implementation. The code was a single-use diagnostic tool.\n\nWhen the spec is ready (Phase 7: Signal), the actual implementation in Phase 8 produces the code you keep. That code benefits from every spec fix that every disposable code run surfaced.\n\n---\n\n## Platform-Specific Caveats\n\nReal builds hit real constraints:\n\n- **Cross-compilation:** If building on Linux for Windows, skip Windows-specific steps. Note them in FINDINGS.md with the marker `[SKIP: cross-platform -- <alternative verification>]`\n- **Hardware dependencies:** If the spec requires hardware not available in the build environment (GPU, specific sensors, etc.), skip those tests and note them\n- **External services:** If the spec requires a running database, a message queue, or a third-party API, either provide them (Docker is useful here) or mock them and note the gap\n- **Licensing:** Some dependencies have licensing constraints that may not be apparent until the code run. Note any licensing discoveries in FINDINGS.md\n\nThese are project-specific blockers. They don't reflect on doc quality. Focus the evaluation on what the agent did within the scope it could actually build.\n",
"semantic_review": "# Semantic Review\n\nLinguistic ambiguity detection as a complement to structural validation. The graph catches structural problems (phase conflicts, orphaned domains, missing WHY blocks). The semantic review catches language problems -- words and phrases that widen the probability distribution for an AI agent reading the spec.\n\n---\n\n## What It Is\n\nA pass over the spec documents looking for probabilistically wide language in binding contexts. Words like \"should,\" \"appropriate,\" \"handle gracefully,\" \"as needed\" -- anything where two reasonable agents would interpret the same sentence differently.\n\nThis is a linguistic review, not a technical or architectural review. It doesn't evaluate whether decisions are correct. It evaluates whether decisions are stated precisely enough for a cold reader to act on without guessing.\n\n---\n\n## Where It Sits in the Pipeline\n\nPhase 3 (Structural Validation), sub-step 3b. After graph analysis (if using it), before cold validation sessions. The reasoning: don't waste cold session tokens on linguistic ambiguity that a text scan can catch mechanically.\n\n---\n\n## What to Scan\n\n### Binding Contexts (scan these)\n- Decided sections\n- Convention entries\n- Spec body text (the actual specification)\n- Constitutional articles\n- Exit criteria\n- TONIC entries\n\n### Excluded Contexts (skip these)\n- Open Questions sections (ambiguity is expected -- they're explicitly unresolved)\n- WHY/WHY NOT block rationale text (explaining reasoning is fine; prescribing behavior with vague words is not)\n- Code blocks\n- The glossary itself (it defines terms)\n- The dictionary and flagged vocabulary files themselves\n\n---\n\n## How to Do It\n\nThe semantic review can be executed multiple ways. The methodology prescribes the what, not the how.\n\n### Option 1: Prompt-Based\nGive a model the spec and the review prompt. It reads the docs and flags ambiguous language. This is the simplest approach and how it was originally done.\n\nSee `prompts/semantic_review_prompt.md` for the template.\n\n### Option 2: Script-Based\nConvert the ambiguous language dictionary to a grep/regex pattern. Scan docs for matches in binding contexts. This is faster and more consistent but misses context-dependent ambiguity.\n\n### Option 3: Agent-Based\nAn AI agent with the dictionary loaded performs a structured scan, evaluating each match in context. This combines the thoroughness of prompt-based review with the coverage of script-based scanning.\n\n### Option 4: Combined\nRun the script first for mechanical coverage, then the prompt-based review for contextual catches. This is the most thorough approach.\n\n---\n\n## The Reference Materials\n\n### Ambiguous Language Dictionary\n`../reference/ambiguous_language_dictionary.md`\n\n31 categories, hundreds of terms. Comprehensive catalog of language patterns that widen probability distributions. Categories include: hedge words, weak modals, vague quantifiers, subjective adjectives, ambiguous temporal words, assumption words, scope-escape phrases, passive voice indicators, and more.\n\n### Flagged Vocabulary\n`../reference/flagged_vocabulary.md`\n\nThe curated, battle-tested subset. Organized by category (subjective quality bars, undefined thresholds, temporal ambiguity, false decision signals, mechanism words without mechanism, placeholder language, scope-widening qualifiers) with three-column tables: flagged word, why it's wide, replace with.\n\n### Replacement Strategy\n\nReplace every flagged word with one of:\n- **A specific number** (\"retry 3 times\" not \"retry as needed\")\n- **A named condition** (\"when HTTP 503 is returned\" not \"when errors occur\")\n- **A defined behavior** (\"return null and log to stderr\" not \"handle gracefully\")\n- **A measurable threshold** (\"respond within 200ms at p99\" not \"respond quickly\")\n- **An explicit actor** (\"the AuthService validates the JWT\" not \"the token is validated\")\n- **A cited standard** (\"per RFC 7519 Section 4.1.4\" not \"per industry standard\")\n\n---\n\n## The \"Unit Test\" Test\n\nIf a requirement cannot be written as a failing test case before the code is written, the language is almost certainly ambiguous. You cannot write a test for \"the UI should feel snappy,\" but you can write one for \"the InteractionToNextPaint (INP) must be under 200ms.\"\n\nIf you can't express it as a test assertion, rewrite the requirement until you can.\n\n---\n\n## Context Matters\n\nSome words on the dictionary are acceptable in specific contexts:\n- \"should\" is precisely defined in RFC 2119 but ambiguous everywhere else\n- \"may\" has a specific meaning in RFC 2119 but is vague in general specs\n- \"approximately\" is acceptable when paired with tolerance (\"approximately 100ms +/-10ms\")\n- Comparative words are fine when a baseline is stated (\"50% faster than v2.1 benchmark\")\n\nThe review is flag-and-report, not auto-reject. A human (or AI agent with context) evaluates whether each flag is a genuine problem or acceptable usage.\n\n---\n\n## False Positive Management\n\nAutomated scanning will produce false positives. See `../reference/false_positive_exclusion_index.md` for the exclusion index methodology.\n\nKey principle: exclude when the usage is genuinely correct and the scanner can't distinguish it from a violation. Fix the doc when the usage is actually ambiguous. Fix the scanner when 50+ exclusions exist for the same rule.\n\n---\n\n## Multi-Model Coverage\n\nDifferent models flag different patterns. Running the semantic review prompt through multiple models increases coverage. One model may catch temporal ambiguity that another misses. Another may flag passive voice constructions the first accepted.\n\nThis is the same principle as cross-model validation in cold sessions: different blind spots surface different gaps.\n",
"backforge": "# BackForging Protocol\n\nHow to reverse-engineer Forge artifacts from an existing codebase. The goal is not to document what was built -- it's to produce a specification precise enough to rebuild the project from scratch, potentially in a different language or stack, and arrive at something architecturally faithful to the intent but likely better than the original.\n\n**Maturity note:** BackForging applies a well-established reverse-engineering pattern (recovering a specification from a running system) using Forge's specific artifact templates. The underlying approach is the same as reverse-engineering a network topology from packet captures, recovering a protocol spec from observed traffic, or extracting an interface contract from a legacy codebase during a rewrite. The pattern is decades old and battle-tested across engineering disciplines. The specific Forge artifact templates (constitution, conventions, domain docs with Decided/Open, WHY blocks, TONIC tables) are newer and will refine through practice. The approach is sound. The templates will sharpen. If you BackForge a project and discover failure modes in the artifact structure, feed your findings back.\n\n---\n\n## What BackForging Is\n\nBackForging is the process of producing the Forge artifacts (constitution, conventions, domain docs, glossary, engineering plan) from a codebase that wasn't built with Forge. Once the spec exists, the code becomes disposable. The spec is the portable artifact. A PHP project BackForged into a proper spec could be rebuilt in Rust, Node, Go, or anything else without losing the architectural decisions.\n\nThis is not documentation. Documentation describes what was built. A Forge spec prescribes what must be built. The difference matters -- a BackForged spec is forward-looking, not backward-looking.\n\n---\n\n## The WHY Block Reframe\n\nIn a forward-Forged project, WHY blocks capture the reasoning at the moment of the decision. In a BackForged project, WHY blocks answer a different question:\n\n**Not:** \"Why did someone build it this way?\"\n**Instead:** \"Why should it continue to be this way?\"\n\nThis is present justification, not historical reconstruction. Even if it's your own project from six months ago, you've forgotten most of the \"why.\" The BackForging agent doesn't know either. Neither of you is doing archaeology. Both of you are evaluating whether each decision has a good reason to persist.\n\nIf a decision can be justified now -- it's a WHY block.\nIf it can't be justified -- it's an Open Question for the rebuild. Maybe it was right at the time and circumstances changed. Maybe it was never right. Either way, the BackForging process surfaced it.\n\nThis is actually more valuable than forward WHY blocks in some cases. Forward WHY blocks can go stale. BackForged WHY blocks are written against current reality.\n\n---\n\n## The Process\n\n### Step 1: Agent Reads the Codebase\n\nA session with large context reads the full codebase and produces draft Forge artifacts. If the codebase exceeds context limits, chunk it by domain boundary: prioritize entry points and public interfaces, summarize peripheral code. Each chunk produces draft artifacts for its domain.\n\n- **Constitution:** What are the system's actual invariants? What architectural laws does the code enforce implicitly? (e.g., \"all writes go through the API layer\" -- is that true in practice or are there backdoors?)\n- **Conventions:** What libraries are used? What patterns are followed? What naming conventions exist? What's the error handling strategy? These are extracted from the code itself, not imagined.\n- **Domain docs:** What are the natural domain boundaries? What does each module/service/package do? What decisions are embedded in the code?\n- **Glossary:** What terms does the codebase use? What do they mean in this project's context?\n- **Decision tracking:** For each architectural decision the agent can identify, create a Decided entry with present-tense justification.\n\n### Step 2: Human Reviews and Fills Gaps\n\nThe agent's draft is a starting point. The human adds:\n\n- **WHY blocks** the agent couldn't infer. The agent sees that Service A uses a message queue to talk to Service B but REST to talk to Service C. It can note the inconsistency but can't know if it's intentional. The human evaluates: is there a good reason? If yes, WHY block. If no, Open Question.\n- **Challenges to the agent's assumptions.** The agent may identify patterns that aren't intentional -- they're just how it happened to evolve. The human decides which patterns are load-bearing architecture and which are accidental.\n- **Scope decisions.** Is the BackForge for a full rebuild or a targeted refactor? What's in scope for the spec and what's out?\n\n### Step 3: Cold Validation Against the Existing Code\n\nRun a cold validation session against the BackForged spec. But instead of asking \"can you build this?\", ask: \"does this spec accurately describe the existing system?\"\n\nThe cold session reads the spec and asks questions. But the evaluation is different:\n\n- If the cold session's understanding of the spec matches the existing code's behavior -- the spec is accurate.\n- If the cold session's understanding diverges from the code -- either the spec is wrong (fix it) or the code has a problem the spec should address (Open Question for the rebuild).\n\n### Step 4: The Valuable Finding\n\nThe most valuable output of BackForging is **decisions that can't be justified now.**\n\nEvery unjustifiable decision is one of:\n- **Accidental complexity** -- it grew this way, nobody designed it, and it should be simplified in the rebuild\n- **Stale decision** -- it was right once but circumstances changed (dependency deprecated, requirements shifted, better options exist now)\n- **Cargo cult** -- copied from somewhere without understanding why, and nobody questioned it\n- **Genuine open question** -- there might be a good reason but nobody currently in the room knows it\n\nAll four become Open Questions in the BackForged spec. The rebuild can address them consciously instead of inheriting them blindly.\n\n### Step 5: Iterate Until Faithful\n\nThe spec is ready when a cold session can read it and produce an architecture that matches the existing system's intent (not necessarily its implementation). The implementation details may change entirely in a rebuild -- different language, different libraries, different patterns. But the architectural decisions, the domain boundaries, the invariants, and the WHY blocks must be consistent.\n\n---\n\n## BackForging for Language/Stack Migration\n\nOnce the spec exists, rebuilding in a different stack is a matter of updating the conventions doc:\n\n1. BackForge the spec from the original codebase (Steps 1-5 above)\n2. Write a new conventions doc for the target stack (new language, new libraries, new patterns)\n3. Update TONIC entries -- the forbidden alternatives change when the ecosystem changes\n4. Run the normal Forge pipeline from Phase 4 onward (see `forge_process.md`, `cold_validation_protocol.md`, `cold_code_run_protocol.md`)\n\nThe constitution and domain docs should need minimal changes -- they describe *what* the system does and *why*, not *how*. The conventions doc is where the stack-specific decisions live. Swap it out, and the cold code run produces the same architecture in a different language.\n\nThis is the portability argument for Forge: the spec is stack-independent. The code is generated from the spec. Change the conventions, regenerate the code.\n\n---\n\n## Frontend BackForging\n\nFrontend code is harder to BackForge because the visual layer can't be fully captured in text. The spec doesn't try to describe what the interface looks like -- it specifies behavior and points at design artifacts.\n\nFor frontend BackForging:\n\n- **Reference the existing design artifacts** -- CSS files, design tokens, component libraries, Figma docs, wireframes, screenshots. The spec says \"use these\" rather than trying to describe the visual in words.\n- **Focus on behavioral specification** -- what happens on error, how keyboard navigation works, loading states, empty states, permission states, responsive breakpoints with specific behaviors. This is the territory covered by the UI/UX ambiguous language dictionary (see `../reference/`).\n- **Extract the interaction model** -- not \"the button is blue and 48px tall\" but \"on submit, validate all fields per Section 22 spec, show inline errors per Section 9 spec, disable the button and show a spinner during the request, re-enable on completion or error.\"\n- **Component inventory** -- what components exist, what props they accept, what states they have. This is the domain doc equivalent for frontend.\n\nThe visual is the designer's job. The spec's job is everything the visual can't communicate.\n\n---\n\n## Common Pitfalls\n\n### Documenting What Instead of Why\nThe strongest temptation in BackForging is to describe the code: \"UserService has methods createUser, getUser, updateUser, deleteUser.\" That's not a spec -- it's a summary. The spec says *why* the user service exists as a separate domain, *why* user creation goes through a specific validation pipeline, *why* soft-delete was chosen over hard-delete. Present justification, not code description.\n\n### Preserving Bad Decisions\nNot every decision in the existing code deserves to survive. The BackForging process questions everything. If a decision can't be justified now, don't carry it forward as a Decided entry. Make it an Open Question that the rebuild addresses.\n\n### Skipping the Cold Validation\nIt's tempting to go straight from BackForged spec to rebuild. Don't. The cold validation step is what catches gaps between what you think the system does and what the spec actually says. Without it, the rebuild will surprise you.\n\n### Treating It as Documentation\nBackForging is not a documentation project. The output is not a description of the existing system. It's a specification for rebuilding it -- one that's precise enough for a cold agent to execute. If the spec says \"the auth service handles authentication,\" it's failed. That's documentation. A spec says \"the AuthService validates JWT tokens using RS256, checks against a JWKS endpoint at `{issuer}/.well-known/jwks.json`, rejects tokens with `exp` in the past or `iss` not matching the configured issuer, and returns a `401` with body `{ error: 'INVALID_TOKEN', reason: '{specific_reason}' }` on any failure.\"\n\n---\n\n## BackForging as Business Logic Refactoring\n\nThis applies to any project with non-trivial business logic -- not just APIs and services, but anything where domain rules are encoded in code.\n\nA codebase carries business logic decisions that nobody questions because they're buried in `if` statements, service methods, and workflow orchestration. The approval chain that requires three levels of sign-off. The pricing calculation that applies a discount only on Tuesdays. The rule that expires a session after 15 minutes but only for users in a specific tier. These are business decisions, not technical ones, but they live in code where only developers see them.\n\nBackForging externalizes them. Every business rule becomes a Decided entry with a present-tense WHY block. Once they're visible and readable, the people who should be evaluating them -- product owners, business stakeholders, domain experts -- actually can. They can't read code. They can read \"Decision 23: Session timeout is 15 minutes for free-tier users, 60 minutes for paid users. WHY: [can you justify this?]\"\n\nThis is where BackForging becomes more than a technical exercise. It's a structured audit of every business rule in the system. The ones that can't be justified get flagged. The ones that are wrong get corrected before the rebuild bakes them in again. The ones that are right get documented so the next developer doesn't accidentally remove them.\n\nIf you're BackForging and you find yourself writing \"WHY: unknown -- this rule exists in the code but nobody knows why\" for a business logic decision, that's not a failure of the process. That's the process working. You found a rule that's been running in production with no justification. Now someone can decide whether it stays or goes.\n\n---\n\n## Business Logic WHY Blocks\n\nWhen BackForging a business application, many WHY blocks aren't technical. They're business logic: \"because the compliance department requires it,\" \"because the vendor contract specifies 90-day retention,\" \"because that's how this industry processes returns.\" This context doesn't exist in the code. It exists in meetings, contracts, regulatory requirements, and the client's institutional knowledge.\n\nExtracting the technical \"what\" from code is straightforward. An AI agent can do it. Extracting the business \"why\" requires conversation with the people who made the business decisions. This is communication and domain expertise work, not engineering work. The person doing the BackForge needs access to the stakeholders who know why the business rules exist, not just the developers who implemented them.\n\nThis is where BackForging as a service becomes valuable. A freelancer or consultant who can sit with the client, ask the right business questions, and produce WHY blocks that capture business rationale is doing work that can't be offshored to someone who never met the client. The technical extraction is commoditized. The business context is not.\n\n---\n\n## When to BackForge\n\n- **Before a major refactor** -- BackForge the affected domains so the refactor has a spec to work from\n- **Before a language/stack migration** -- BackForge the whole project, swap conventions, rebuild\n- **When inheriting a codebase** -- BackForge to understand what you've inherited and where the risks are\n- **When scaling a solo project to a team** -- BackForge so new team members have a spec to read instead of reverse-engineering the code themselves\n- **When the original developers are gone** -- the code is all you have; BackForging is how you extract the knowledge before it's needed under pressure\n"
},
"prompts": {
"cold_validation": "# Cold Validation Prompt\n\n*Template for cold validation sessions. Adapt the bracketed sections to your project.*\n\n---\n\n## The Prompt\n\n```\nThis is a greenfield project. Read the forge methodology docs first, then the\narchitecture documents. Create a phased coding plan from them. If anything is\nambiguous, ask, don't guess. There are open questions in the docs; if you\nthink you can answer one from context, ask about it. We are not executing\nyet, just planning. Adhere to the Forge methodology. Go. Ask questions along\nthe way to reduce ambiguities for the next section.\n```\n\n*For non-code projects, replace \"coding plan\" with the appropriate output type (policy implementation plan, compliance checklist, etc.).*\n\n---\n\n## Key Elements\n\n**\"Read through all of the documentation starting with the forge documents\"** -- establishes the reading order. Forge methodology first (so it understands the doc structure), then the project spec.\n\n**\"If there are any ambiguities, ask questions, don't guess\"** -- explicitly overrides the model's default behavior of filling gaps with assumptions.\n\n**\"There are still open questions but if you think you have an answer to one based on the rest of the context, feel free to ask about it\"** -- acknowledges OQs are expected and invites the model to engage with them rather than skip them.\n\n**\"We are not executing the plan yet\"** -- separates the planning/validation phase from implementation. The plan is a forcing function, not an instruction to build.\n\n**\"Ask questions along the way to reduce ambiguities for the next section\"** -- the critical addition. Without this, models batch all questions at the end, producing a decontextualized wall of questions. With it, they ask inline, in context, where each question has maximum diagnostic value.\n\n---\n\n## Adaptation Notes\n\n- For **non-code projects**, replace \"coding plan\" with the appropriate output type\n- For **cross-model runs**, use the identical prompt for both models so differences in output reflect model behavior, not prompt variation\n- For **subsequent rounds** (after doc fixes), use the same prompt with the updated docs -- the cold session doesn't know or need to know that fixes were made\n- The prompt can be refined per project, but preserve the core elements (read everything, ask don't guess, ask inline)\n",
"cold_code_run": "# Cold Code Run Prompt\n\n*Template for cold code run sessions. Adapt the bracketed sections to your project.*\n\n---\n\n## The Prompt\n\n```\nThis is a greenfield project. You have the Forge methodology documents, the\nfull architecture specification, and the coding plan. Build according to the\nplan, following the phased sequence and exit criteria.\n\nAdhere to:\n- The project constitution (immutable architectural laws)\n- The conventions document (canonical dependencies, naming, patterns)\n- The domain documents (specifications, Decided entries, WHY blocks)\n- The glossary (terminology)\n\nIf you encounter an ambiguity in the spec, document it in FINDINGS.md with\nwhat you chose and why, then continue. Do not stop to ask. If you encounter\nsomething you cannot build due to [platform/environment constraints -- e.g.,\n\"no Windows compilation environment available\"], skip it with a note in\nFINDINGS.md and continue.\n\n[Any project-specific caveats: e.g., \"Skip Phase 3's Windows service\ninstallation tests -- we're building on Linux. Note the skip in FINDINGS.md\nwith alternative verification steps.\"]\n```\n\n---\n\n## Key Elements\n\n**\"Build according to the plan\"** -- directs the agent to follow the plan, not redesign the architecture.\n\n**\"Adhere to\" list** -- explicit reminder of the document hierarchy.\n\n**\"If you encounter an ambiguity, document it in FINDINGS.md\"** -- captures every guess for postmortem review. This is how the code run feeds back into the spec.\n\n**\"Do not stop to ask\"** -- in a cold code run, the agent should keep building. Questions were resolved in cold validation. What remains are the ambiguities the validation missed -- those are the valuable findings.\n\n**Platform caveats** -- explicit carve-outs for things the build environment can't do. These prevent the agent from stalling on unbuildable targets.\n\n---\n\n## Adaptation Notes\n\n- The prompt is simpler than the cold validation prompt because the code run is more directed -- build this thing, note what's unclear\n- For **non-Rust/Go projects**, adjust platform caveats accordingly\n- FINDINGS.md is the critical output, not the code -- make sure the agent knows to document every guess and uncertainty\n- For **alternate model runs**, use the identical prompt with updated docs so differences reflect model behavior\n",
"semantic_review": "# Semantic Review Prompt\n\n*Template for semantic ambiguity review sessions. Adapt the bracketed sections to your project.*\n\n---\n\n## The Prompt\n\n```\nThe docs in [docs/architecture/ or wherever your spec lives] were written\nunder a methodology described in [path to forge methodology docs]. Read that\nfirst so you understand why the documents are structured the way they are --\nbut your job is NOT to validate the methodology's application.\n\nYour job is purely linguistic: find ambiguous, vague, or probabilistically\nwide language that would cause an AI coding agent to guess rather than follow\na specific instruction. Words like 'should', 'may', 'appropriate', 'as\nneeded', 'etc.' -- anything where two reasonable agents would interpret the\nsame sentence differently.\n\nThis is a semantic review, not a technical or architectural review.\n\nReference the ambiguous language dictionary [path to dictionary] for the\nfull catalog of flagged patterns. Focus on binding contexts: Decided\nsections, conventions, spec body text, constitutional articles, exit\ncriteria. Skip Open Questions, WHY block rationale, code blocks, and the\nglossary.\n\nWrite findings to ambiguity_review.md organized by document and severity.\nFor each finding, include:\n- The file and approximate location\n- The flagged word or phrase\n- Why it's ambiguous (what would two agents interpret differently?)\n- A suggested replacement or the information needed to write one\n```\n\n---\n\n## Key Elements\n\n**\"Your job is NOT to validate the methodology's application\"** -- prevents the model from second-guessing the Forge structure instead of doing the linguistic review.\n\n**\"Purely linguistic\"** -- keeps the review focused. Technical correctness of decisions is not the concern; precision of language is.\n\n**\"Two reasonable agents would interpret the same sentence differently\"** -- the operational definition of ambiguity in the Forge methodology.\n\n**\"Reference the ambiguous language dictionary\"** -- points the model at the comprehensive word list rather than relying on its own judgment about what's vague.\n\n**Context-aware exclusions** -- Open Questions are expected to be ambiguous (they're unresolved). WHY rationale text explains reasoning, not prescribes behavior. Code blocks are code, not spec.\n\n---\n\n## Adaptation Notes\n\n- For **first-time reviews** (no dictionary loaded yet), the inline examples (\"should,\" \"may,\" \"appropriate\") are sufficient to calibrate the model\n- For **subsequent reviews** (after doc fixes), use the same prompt to verify fixes and catch anything new\n- For **multi-model coverage**, run through different models -- they flag different patterns\n- Treat the output file (ambiguity_review.md) as a working document, not a permanent artifact -- its findings flow into doc fixes, then it's obsolete\n",
"miniforge": "# MiniForge Prompt\n\n*Drop this into a session with your vibe-coded project. It does a quick assessment, asks a few questions, and produces a lightweight Forge document tailored to what your project actually needs.*\n\n---\n\n## The Prompt\n\n```\nI have an existing project that was built with AI assistance (vibe coded). I want\nyou to do a MiniForge assessment. Here's what that means:\n\n1. READ THE CODEBASE. Look at the structure, the dependencies, the patterns, the\n error handling, the data storage, the API surface, everything you can see.\n\n2. ASK ME THESE QUESTIONS (and wait for answers before proceeding):\n - Who uses this? Just me, a small group, or the public?\n - Does it handle money, personal data, health data, or anything sensitive?\n - Does it need to stay running reliably, or is downtime acceptable?\n - Am I the only one who will ever work on this, or might others contribute?\n - Is this a prototype/experiment, or is it heading toward production?\n\n3. Based on my answers and what you see in the code, produce a MiniForge document\n that includes:\n\n a. CONVENTIONS (what I'm already using):\n List every library, framework, and tool choice you can detect in the code.\n Format as \"Use X, do not switch to Y\" so future sessions don't change my\n stack without asking.\n\n b. HARD RULES (things that must not change):\n Based on my answers about audience and sensitivity, list the 3-5 rules\n this project must follow. These are constitutional articles. Examples:\n \"User data stays on the server\", \"All payments go through [whatever\n payment provider I'm using]\", \"The app works without an internet\n connection.\"\n\n c. QUICK WINS (things to fix right now):\n Scan for common issues that vibe-coded projects have:\n - API keys, secrets, or credentials in the code (not in env vars)\n - No error handling (bare try/catch with no useful response)\n - No input validation on user-facing forms or API endpoints\n - SQL injection or XSS vulnerabilities\n - Hardcoded values that should be configurable\n - No rate limiting on public endpoints\n - Missing authentication on endpoints that need it\n - Console.log / print statements used as the logging strategy\n - No .gitignore or secrets committed to git history\n - Dependencies with known security vulnerabilities\n Don't just list them. For each one, show me exactly what to fix and where.\n\n d. RISK ASSESSMENT:\n Based on my answers and the code, give me a honest assessment:\n - GREEN: This project is fine as-is for its intended use. The quick wins\n would help but aren't critical.\n - YELLOW: This project has issues that will cause problems if it grows or\n gets real users. Fix the quick wins and consider a conventions doc.\n - RED: This project has issues that could cause data loss, security\n breaches, or reliability failures. Fix the quick wins immediately and\n consider a fuller Forge process.\n\n e. NEXT STEPS (only if Yellow or Red):\n What specific Forge practices would benefit this project, ordered by\n impact. Don't recommend the full methodology if it doesn't need it.\n Be specific: \"Add WHY blocks to these 3 decisions\" not \"consider adding\n WHY blocks.\"\n\n4. Output everything as a single markdown file called miniforge.md that I can\n keep in my project root. Future AI sessions read this file first.\n```\n\n---\n\n## What It Produces\n\nA single `miniforge.md` file in the project root containing:\n\n- **Conventions**: locked stack choices extracted from the actual codebase\n- **Hard rules**: 3-5 constitutional articles based on the project's audience and sensitivity\n- **Quick wins**: specific, actionable fixes for common vibe-coding issues with file paths and code changes\n- **Risk assessment**: GREEN/YELLOW/RED honest evaluation\n- **Next steps**: only if needed, specific recommendations ordered by impact\n\nThe file is small, practical, and immediately useful. Future AI sessions read it first and respect the locked choices. The quick wins improve code quality without changing architecture. The risk assessment tells the developer honestly whether they need to go deeper.\n\n---\n\n## When to Use This\n\n- You vibe-coded something and want to know if it's okay\n- You inherited a project and want a quick health check\n- You're thinking about putting something into production and want to know what needs fixing first\n- You're not sure whether you need Forge, and this tells you\n\n---\n\n## What It's Not\n\nMiniForge is not the full methodology. It doesn't produce domain docs, engineering plans, or edge case catalogs. It doesn't run cold validation or cross-model testing. It's a quick assessment that captures your existing choices, flags the obvious issues, and tells you whether you need more structure.\n\nIf MiniForge says GREEN, you're probably fine. Keep the miniforge.md file for consistency and move on.\n\nIf it says YELLOW, adopt the Vibe to Forge recommendations and add Forge practices as the project grows.\n\nIf it says RED, seriously consider a fuller Forge process. The quick wins aren't enough if the project has fundamental issues.\n\n---\n\n## The Five-Minute Version\n\nIf you don't want to paste the full prompt, this works too:\n\n```\nLook at my codebase. Tell me what libraries I'm using (so I can lock them),\nwhat security issues you see (so I can fix them), and whether this project\nis okay for production use or needs more engineering work. Be honest.\n```\n\nThat's not MiniForge, it's just a health check. But it's better than nothing, and it often leads to \"okay, maybe I should do the full MiniForge.\"\n"
}
},
"reference": {
"ambiguous_language_dictionary": "*Forge Reference Material -- used by the semantic review process (see `../methodology/semantic_review.md`). This is a detection vocabulary, not the methodology itself.*\n\n# Ambiguous Language Dictionary\n\n**Purpose:** Reference for identifying and eliminating ambiguous language in specification documents used to seed AI coding agents. Every word/phrase listed here should be flagged for replacement with precise, measurable, or deterministic language.\n\n---\n\n## 1. Hedge Words\n\nWords that signal uncertainty or lack of commitment to a statement.\n\nmaybe, perhaps, possibly, probably, likely, unlikely, conceivably, presumably, supposedly, allegedly, apparently, seemingly, ostensibly, arguably, plausibly, potentially, tentatively, hypothetically, speculatively, in theory, in principle, it seems, it appears, it looks like, more or less\n\n---\n\n## 2. Weak / Ambiguous Modals\n\nModal verbs that leave behavioral requirements undefined.\n\nshould, could, would, might, may, can, ought to, shall (when used inconsistently), need to (without \"must\" force), want to, prefer to, tend to, be able to\n\n---\n\n## 3. Non-Committal Action Phrases\n\nPhrases that express intent without guaranteeing execution.\n\ntry to, attempt to, aim to, strive to, endeavor to, seek to, hope to, plan to, intend to, expect to, aspire to, look to, work toward, make an effort to, do our best to, as much as possible, to the extent possible, where feasible, if feasible, to the degree practical\n\n---\n\n## 4. Vague Quantifiers\n\nWords that describe amounts without specifying them.\n\nsome, many, few, several, various, numerous, a number of, a lot of, lots of, plenty of, a handful of, a bunch of, multiple, a couple of, a bit of, a little, a great deal of, most, much, enough, sufficient, insufficient, adequate, inadequate, excessive, minimal, ample, substantial, considerable, negligible, marginal, moderate, a majority of, a minority of, a fraction of, a portion of, a subset of, a percentage of\n\n---\n\n## 5. Vague Qualifiers / Degree Words\n\nWords that modify intensity without defining thresholds.\n\nfairly, rather, quite, somewhat, relatively, comparatively, reasonably, moderately, slightly, a little, a bit, kind of, sort of, more or less, to some extent, to a degree, to a certain extent, in part, partially, largely, mostly, primarily, predominantly, substantially, significantly, considerably, remarkably, noticeably, appreciably, marginally, mildly, extremely, very, really, highly, greatly, deeply, tremendously, enormously, immensely, incredibly, exceedingly, particularly, especially, notably, markedly, decidedly, distinctly, profoundly, vastly\n\n---\n\n## 6. Subjective / Relative Adjectives\n\nAdjectives whose meaning depends on unstated baselines or observer perspective.\n\n### Performance\nfast, slow, quick, rapid, speedy, sluggish, responsive, unresponsive, performant, efficient, inefficient, optimal, suboptimal, scalable, lightweight, heavyweight, snappy, laggy, real-time (without definition)\n\n### Size / Scale\nlarge, small, big, tiny, huge, massive, enormous, compact, short, long, wide, narrow, thick, thin, deep, shallow, extensive, limited, broad, brief\n\n### Quality\ngood, bad, poor, nice, fine, great, excellent, terrible, awful, decent, acceptable, unacceptable, adequate, inadequate, sufficient, insufficient, satisfactory, unsatisfactory, reasonable, unreasonable, appropriate, inappropriate, suitable, unsuitable, proper, improper, correct, incorrect, right, wrong, clean, dirty, messy, elegant, ugly, beautiful, graceful, clumsy, robust, fragile, solid, weak, strong, stable, unstable, reliable, unreliable, secure, insecure, safe, unsafe, healthy, unhealthy\n\n### Complexity\nsimple, complex, complicated, straightforward, trivial, nontrivial, easy, hard, difficult, challenging, basic, advanced, sophisticated, intuitive, unintuitive, obvious, obscure, clear, unclear, confusing, convoluted, readable, unreadable, maintainable, unmaintainable\n\n### Recency / Modernity\nmodern, old, new, legacy, outdated, up-to-date, current, latest, recent, cutting-edge, state-of-the-art, next-generation, bleeding-edge, mature, established, traditional, conventional\n\n---\n\n## 7. Ambiguous Temporal / Frequency Words\n\nWords describing when or how often without specificity.\n\nsoon, later, eventually, shortly, presently, momentarily, immediately (without SLA), promptly, quickly, in a timely manner, in due course, in due time, at some point, down the road, in the future, in the near future, in the long run, in the short term, in the medium term, going forward, moving forward, over time, from time to time, now and then, once in a while, on occasion, periodically, intermittently, sporadically, regularly, frequently, often, sometimes, occasionally, rarely, seldom, infrequently, hardly ever, almost never, always (when not literally always), never (when not literally never), constantly, continuously, continually, perpetually, indefinitely, temporarily, briefly, for a while, for the time being, until further notice, as needed, when needed, when ready, when available, when convenient, at your earliest convenience, as soon as possible, ASAP\n\n---\n\n## 8. Assumption / False-Certainty Words\n\nWords that assert something is obvious, discouraging the reader from questioning it.\n\nobviously, clearly, evidently, apparently, naturally, certainly, surely, undoubtedly, unquestionably, undeniably, of course, needless to say, it goes without saying, as everyone knows, as you know, as we all know, as is well known, it is well established, it is common knowledge, it stands to reason, it follows that, by definition, inherently, fundamentally, essentially, basically, simply put, in essence, trivially\n\n---\n\n## 9. Approximation / Imprecision Words\n\nWords that signal inexact values.\n\nabout, approximately, around, roughly, nearly, almost, close to, in the neighborhood of, in the ballpark of, on the order of, give or take, plus or minus, more or less, upwards of, up to, at least, at most, no more than (when vague), no fewer than, somewhere between, somewhere around, ish (as suffix), order of magnitude\n\n---\n\n## 10. Vague Conditional / Situational Phrases\n\nPhrases that defer decisions to undefined future conditions.\n\nif needed, if necessary, if required, if appropriate, if applicable, if relevant, if desired, if possible, if convenient, if practical, if warranted, if justified, as needed, as necessary, as required, as appropriate, as applicable, as desired, when possible, when necessary, when appropriate, when applicable, when relevant, where possible, where necessary, where appropriate, where applicable, where relevant, where practical, where feasible, depending on, based on context, based on circumstances, on a case-by-case basis, at the discretion of, at the developer's discretion, at the user's discretion, subject to, contingent on, provided that (without specifying what), assuming that (without verifying), unless otherwise specified, unless otherwise noted, unless otherwise stated, unless there's a reason not to, except in special cases, in certain cases, in some cases, in most cases, under certain conditions, under normal conditions, under typical conditions, under normal circumstances, under ideal circumstances\n\n---\n\n## 11. Vague Verbs / Action Words\n\nVerbs that describe activity without specifying behavior.\n\nhandle, manage, process, deal with, take care of, address, support, facilitate, leverage, utilize, implement (without specifics), integrate (without specifics), maintain, oversee, coordinate, orchestrate, streamline, optimize (without metric), enhance (without metric), improve (without metric), ensure (without verification method), guarantee (without mechanism), validate (without criteria), verify (without criteria), check (without criteria), review (without criteria), evaluate (without criteria), assess (without criteria), analyze (without output), monitor (without thresholds), track (without metrics), log (without specifying what), report (without format), notify (without channel/format), alert (without conditions), flag (without criteria), escalate (without path), resolve (without definition of resolved), mitigate (without strategy), remediate (without process), troubleshoot, debug, fix (without acceptance criteria), patch, update (without scope), refactor (without goals), clean up, tidy up, polish, finalize, wrap up, flesh out, iron out, sort out, figure out, work out, look into, dig into, get to the bottom of, circle back on, follow up on, loop in, sync up, align on, touch base on, reach out, interface with, interact with, communicate with, liaise with, collaborate on, partner on, spearhead, champion, drive, own, be responsible for (without specific deliverables)\n\n---\n\n## 12. Scope-Escape / Open-Ended Words\n\nWords and phrases that create unbounded scope or undefined completeness.\n\netc, et cetera, and so on, and so forth, and the like, and such, and more, and others, among others, among other things, including but not limited to, such as (when list is not exhaustive), for example (when used as specification), e.g. (when used as specification), for instance, like (as in \"things like X\"), similar, similar to, related, along those lines, in that vein, of that nature, and everything else, anything else, whatever else, wherever else, whenever, whatever, however (as \"in whatever way\"), plus more, plus others, anything relevant, anything applicable, all relevant, all applicable, all necessary, everything needed, everything required, the usual, the standard stuff, the rest, other stuff, miscellaneous, assorted, sundry, various and sundry\n\n---\n\n## 13. Vague Error / Exception Handling Language\n\nWords used to hand-wave failure modes.\n\ngracefully, properly, correctly, appropriately, as expected, in a reasonable manner, in an orderly fashion, cleanly, safely, without issues, without problems, without errors, without breaking, without crashing, without side effects, without negative impact, with minimal disruption, with minimal downtime, seamlessly, transparently, silently, quietly, unobtrusively, intelligently, smartly, sensibly, logically, sanely, defensively, fail-safe (without definition), fault-tolerant (without definition), self-healing (without mechanism), auto-recover, degrade gracefully (without specifying degraded behavior), fall back (without specifying fallback), retry (without specifying count/backoff/conditions), best-effort (without defining what that means)\n\n---\n\n## 14. Undefined Standards / Appeals to Authority\n\nReferences to standards or practices without citing them.\n\nbest practice, best practices, industry standard, industry best practice, common practice, standard practice, conventional wisdom, accepted practice, established practice, gold standard, state of the art, world-class, enterprise-grade, production-ready, production-quality, battle-tested, proven, well-known, well-understood, well-documented, well-tested, well-architected, well-designed, well-engineered, well-structured, idiomatic, canonical, textbook, by the book, per the standard, per convention, per the norm, as recommended, as suggested, as advised, as prescribed, as dictated by, community consensus, the community recommends, experts suggest, research shows (without citation), studies show (without citation), data suggests (without citation)\n\n---\n\n## 15. Vague Comparison / Relative Terms\n\nComparative language without baselines or measurements.\n\nbetter, worse, faster, slower, cheaper, more expensive, simpler, more complex, easier, harder, cleaner, more elegant, more efficient, more performant, more scalable, more robust, more reliable, more secure, more maintainable, more readable, more flexible, more extensible, more modular, more testable, more portable, more compatible, improved, enhanced, upgraded, optimized, streamlined, refined, superior, inferior, preferable, comparable, competitive, on par with, as good as, not as good as, outperforms, underperforms, exceeds, falls short of, beats, loses to, ahead of, behind, above average, below average, higher, lower, greater, lesser, more, less, increased, decreased, reduced, expanded, maximum (without value), minimum (without value), peak, baseline (without value)\n\n---\n\n## 16. Weasel Words / Generalization Words\n\nWords that generalize without committing to universality.\n\ngenerally, typically, normally, usually, commonly, traditionally, conventionally, customarily, ordinarily, routinely, habitually, as a rule, by and large, for the most part, in general, in most cases, in many cases, in practice, in reality, in effect, in principle, on the whole, overall, broadly, broadly speaking, loosely, loosely speaking, more often than not, nine times out of ten, the vast majority, almost always, almost never, all but, virtually, practically, effectively, essentially, fundamentally, at its core, at the end of the day, when all is said and done, all things considered, on balance, net-net\n\n---\n\n## 17. Ambiguous Pronouns / References\n\nPronouns and references that can create ambiguity when antecedent is unclear.\n\nit, this, that, these, those, they, them, their, its, the former, the latter, the above, the below, the aforementioned, the following, said (as adjective), such (as pronoun), same, the other, another, both, either, neither, each, all, any, every, everything, anything, something, nothing, whatever, whichever, whoever, one (as pronoun), ones\n\n**Note:** These are not inherently ambiguous but become so when their referent is unclear in context. Flag when the antecedent is more than one sentence away or when multiple referents are possible.\n\n---\n\n## 18. Passive Voice Indicators\n\nPassive constructions that hide the actor/responsible party.\n\nis done, was done, will be done, should be done, must be done, needs to be done, has been done, had been done, is being done, was being done, is handled, is managed, is processed, is performed, is executed, is triggered, is called, is invoked, is created, is generated, is returned, is sent, is received, is stored, is loaded, is initialized, is configured, is set up, is deployed, is installed, is updated, is validated, is verified, is checked, is logged, is monitored, is reported, is displayed, is rendered, is shown, is hidden, is enabled, is disabled, is allowed, is denied, is granted, is revoked, is assigned, is delegated, is escalated, is resolved, is completed, is finalized, is approved, is rejected, is accepted, is declined\n\n**Note:** Passive voice is ambiguous in specs because it omits WHO or WHAT performs the action. Flag and rewrite as \"Component X does Y.\"\n\n---\n\n## 19. Filler / Padding Phrases (Zero Information Content)\n\nPhrases that add words without adding meaning.\n\nit should be noted that, it is worth noting that, it is important to note that, it is worth mentioning that, it bears mentioning that, it is interesting to note that, it goes without saying that, needless to say, as previously mentioned, as stated above, as noted earlier, as discussed, as we know, as you can see, at this point in time, at the present time, in today's world, in the current landscape, in this day and age, the fact of the matter is, the thing is, the point is, what this means is, what we're saying is, in other words, put another way, that is to say, to put it simply, to be clear, to be fair, to be honest, frankly, honestly, truthfully, in all honesty, at the end of the day, when push comes to shove, the bottom line is, long story short, in a nutshell, to make a long story short, having said that, that being said, with that being said, be that as it may, that notwithstanding, regardless, irregardless, in any case, in any event, either way, one way or another\n\n---\n\n## 20. Deferred-Decision Language\n\nPhrases that push decisions to an undefined future.\n\nTBD, to be determined, to be decided, to be defined, to be confirmed, to be finalized, to be discussed, to be agreed upon, to be researched, to be investigated, to be explored, TBA, to be announced, to be specified, to be documented, pending, pending review, pending approval, pending discussion, pending further analysis, under consideration, under review, under discussion, under investigation, open question, open issue, open item, placeholder, stub, TODO, FIXME, HACK, TEMP, TEMPORARY, PLACEHOLDER, WIP, work in progress, draft, provisional, preliminary, initial, first pass, rough draft, v0, prototype, proof of concept, spike, experimental, exploratory, investigative, research phase, discovery phase, needs further thought, needs more analysis, needs discussion, let's revisit, we'll come back to this, parking lot, deferred, postponed, backlogged, icebox, on hold, blocked by (without specifying what)\n\n---\n\n## 21. Rhetorical / Persuasion Words (Opinion Masquerading as Fact)\n\nWords that frame subjective preferences as objective truths.\n\nideal, ideally, perfect, perfectly, optimal, optimally, the right way, the correct approach, the proper method, the best way, the only way, the smart choice, the obvious choice, the natural choice, the logical choice, a no-brainer, a slam dunk, elegant, beautifully, brilliantly, cleverly, masterfully, artfully, thoughtfully, carefully designed, well-crafted, nicely done, the way it should be, how it's meant to work, the intended behavior, by design, as designed, as intended, as envisioned, as architected\n\n---\n\n## 22. False Precision / Pseudo-Quantitative Language\n\nLanguage that sounds precise but isn't.\n\norder of magnitude, ballpark, in the range of, on the order of, a significant number, a nontrivial amount, a meaningful percentage, a measurable impact, a noticeable difference, a material change, a tangible improvement, a marked increase, a sharp decline, an exponential growth (used loosely), a quantum leap (used loosely), a sea change, a paradigm shift, a step function, a game changer, a force multiplier, an inflection point, a tipping point, a critical mass, a long tail, at scale, at volume, high-volume, low-latency (without number), high-throughput (without number), high-availability (without number), near-zero (without threshold), near-instant, near-real-time, sub-second (sometimes acceptable, often not specific enough)\n\n---\n\n## 23. Contradictory / Self-Canceling Qualifiers\n\nPhrases where the qualifier undermines the assertion.\n\nbut not always, but not necessarily, but not limited to, except when it isn't, unless it doesn't, or not, or maybe not, to some degree, in a sense, in a way, sort of, kind of, more or less, if you will, as it were, so to speak, in a manner of speaking, for lack of a better word, for want of a better term, roughly speaking, loosely defined, broadly defined, in the broadest sense, in the narrowest sense, depending on how you look at it, from a certain point of view, in certain respects, in some respects\n\n---\n\n## 24. Implicit Requirement Words\n\nWords that sneak in requirements without explicit declaration.\n\nalso, additionally, plus, furthermore, moreover, in addition, on top of that, not to mention, along with, as well as, coupled with, combined with, together with, in conjunction with, in parallel, at the same time, simultaneously, meanwhile, incidentally, by the way, oh and, and of course, and naturally, and obviously, while we're at it, while you're at it, since we're here, as a bonus, as an aside\n\n**Note:** These are flags because they often introduce scope creep or undocumented requirements. Each \"also\" deserves its own explicit requirement statement.\n\n---\n\n## 25. Anthropomorphization / Agent-Confusion Words\n\nWords that attribute intent or intelligence to code without specifying mechanism.\n\nknows, understands, realizes, recognizes, figures out, learns, remembers, forgets, decides, chooses, prefers, wants, needs, tries, attempts, expects, assumes, believes, thinks, considers, determines, discovers, detects (without specifying how), senses, feels, sees, looks at, watches, listens, notices, ignores, cares about, is smart enough to, is aware of, is capable of, adapts to, adjusts to, responds to (without event specification), reacts to, anticipates, predicts, infers, deduces, concludes, judges, evaluates (without criteria)\n\n**Note:** Code doesn't \"know\" or \"want.\" Replace with specific mechanism: \"reads from config,\" \"compares against threshold,\" \"queries the database for.\"\n\n---\n\n## 26. Absolutist Words (Often Inaccurate)\n\nWords claiming totality that are rarely literally true.\n\nalways, never, all, none, every, no, any (in absolute sense), everything, nothing, everyone, no one, nobody, anywhere, nowhere, everywhere, completely, totally, entirely, absolutely, utterly, wholly, fully, perfectly, 100%, zero, infinite, unlimited, universal, global (when meaning \"everywhere\"), permanent, forever, eternal, immutable (when not literally true), guaranteed (without SLA), impossible, mandatory (without enforcement), required (without consequence), critical (without impact definition), essential (without justification), vital, crucial, imperative, paramount, non-negotiable (without stating what it means)\n\n---\n\n## 27. Data Type & Schema Ambiguity\n\nWords that describe data without defining its structure, type, or constraints.\n\ndata, value, information, content, record, entry, field, parameter, input, output, payload, metadata, blob, string (when it should be an Enum), number (when it should be an Int or Float), date (without format), time (without timezone), ID (without format like UUID or Snowflake), object, item, element, entity, resource, asset, artifact, token (without specifying type), key (without specifying type/format), result, response, request, message, event, signal, flag (without type), status (without enum), type (without enum), code (without enum), level (without enum), mode (without enum), state (without enum), config, options, props, args, params, context, scope, reference, handle, pointer, descriptor, tag, label, name (without constraints), title (without constraints), description (without constraints)\n\n**Flag for:** \"Pass the data to the service.\" **Replace with:** \"POST a JSON object conforming to `CreateOrderSchema` to the `/api/v1/orders` endpoint.\"\n\n---\n\n## 28. Persona & Actor Ambiguity\n\nGeneric nouns for entities that have distinct permissions or roles.\n\nthe user, the admin, the client, the system, the platform, the back-end, the front-end, the server, the service, the application, the app, the API, the database, the cache, the queue, the worker, the consumer, the producer, the publisher, the subscriber, the caller, the callee, the sender, the receiver, the requester, the responder, the provider, the consumer, third-party, someone, anyone, everyone, the owner, the actor, the requester, the operator, the manager, the handler, the controller, the agent, the bot, the scheduler, the orchestrator, the coordinator, we, our system, our service, our API\n\n**Flag for:** \"The user should see an error.\" **Replace with:** \"The `UnauthenticatedUser` persona receives a 401 response body `{ error: 'INVALID_TOKEN' }` and the client redirects to `/login`.\"\n\n---\n\n## 29. Logical & Boolean Trap Words\n\nConnectives that create forks in logic without specifying the truth table.\n\nand/or, either... or (without specifying exclusivity), unless (without the else case), if and only if (when used colloquially), vice versa, respectively, as follows (when followed by a non-exhaustive list), otherwise (without specifying the alternative), alternatively, conversely, in contrast, on the other hand, except (without exhaustive exception list), other than, apart from, aside from, barring, save for, but (as a logical qualifier without specifying the excluded case), yet (as a logical qualifier), however (as a logical qualifier), still (as a logical qualifier), although, even though, despite, regardless of, irrespective of, notwithstanding, whether or not (without specifying both branches)\n\n**Flag for:** \"The system returns a success message and/or redirects.\" **Replace with:** \"The system returns a 200 with body `{ status: 'ok' }` AND issues a 302 redirect to `/dashboard`. Both always occur; neither is optional.\"\n\n---\n\n## 30. Unit-less Values\n\nAny standalone number that lacks a dimension. Not specific words but a critical pattern to flag.\n\n10, 100, 500, 1000, 0.5, half, double, triple, twice, manifold, tenfold, hundredfold, a factor of, an order of magnitude, by X (without unit), up to X (without unit), at least X (without unit), no more than X (without unit), between X and Y (without unit), every X (without unit), after X (without unit), within X (without unit), timeout X (without unit), delay X (without unit), interval X (without unit), limit X (without unit), max X (without unit), min X (without unit), threshold X (without unit), buffer X (without unit), size X (without unit), length X (without unit), width X (without unit), depth X (without unit), count X (without unit or type), capacity X (without unit)\n\n**Flag for:** \"Timeout after 30.\" **Replace with:** \"Timeout after 30 seconds.\" / \"Set `MAX_RETRIES` to 30 (unit: retry attempts).\"\n\n---\n\n## 31. State / Lifecycle Ambiguity\n\nWords describing the phase of a process without defining the state machine or transitions.\n\nstarted, starting, running, finished, completed, stopped, stopping, paused, pausing, resumed, resuming, reset, restarted, restarting, refreshed, refreshing, reloaded, reloading, updated, updating, stale, fresh, current, old, new, active, inactive, idle, busy, blocked, unblocked, pending, queued, dequeued, processing, processed, failed, errored, succeeded, timed out, expired, cancelled, cancelling, retrying, skipped, ignored, acknowledged, unacknowledged, initialized, uninitialized, loaded, unloaded, mounted, unmounted, connected, disconnected, open, closed, opening, closing, locked, unlocked, enabled, disabled, on, off, ready, not ready, available, unavailable, online, offline, up, down, alive, dead, healthy, unhealthy, degraded, recovering, recovered, synced, unsynced, dirty, clean, committed, uncommitted, published, unpublished, deployed, undeployed, provisioned, deprovisioned, registered, unregistered, subscribed, unsubscribed, authenticated, unauthenticated, authorized, unauthorized\n\n**Flag for:** \"After the process finishes, restart the service.\" **Replace with:** \"When `ProcessRunner` emits `State.COMPLETED` (exit code 0), the `ServiceManager` transitions `PaymentService` from `State.IDLE` to `State.STARTING` via `ServiceManager.restart(serviceId)`.\"\n\n---\n\n## Usage Notes\n\n### Replacement Strategy\nEvery flagged word should be replaced with one of:\n- **A specific number** (\"retry 3 times\" not \"retry as needed\")\n- **A named condition** (\"when HTTP 503 is returned\" not \"when errors occur\")\n- **A defined behavior** (\"return null and log to stderr\" not \"handle gracefully\")\n- **A measurable threshold** (\"respond within 200ms at p99\" not \"respond quickly\")\n- **An explicit actor** (\"the AuthService validates the JWT\" not \"the token is validated\")\n- **A cited standard** (\"per RFC 7519 Section 4.1.4\" not \"per industry standard\")\n\n### Context Matters\nSome words on this list are acceptable in specific contexts:\n- \"should\" is precisely defined in RFC 2119 but ambiguous everywhere else\n- \"may\" has a specific meaning in RFC 2119 but is vague in general specs\n- \"approximately\" is acceptable when paired with tolerance (\"approximately 100ms ±10ms\")\n- Comparative words are fine when a baseline is stated (\"50% faster than v2.1 benchmark\")\n\n### Automated Scanning\nThis list can be converted to a grep/regex pattern, a linting rule, or an LLM pre-processing filter. Recommended approach: flag occurrences, don't auto-reject — context determines whether the usage is genuinely ambiguous.\n\n### The \"Unit Test\" Test\nIf a requirement cannot be written as a failing test case before the code is written, the language is almost certainly ambiguous. You cannot write a test for \"the UI should feel snappy,\" but you can write one for \"the `InteractionToNextPaint` (INP) must be under 200ms.\" If you can't express it as a test assertion, rewrite the requirement until you can.\n",
"flagged_vocabulary": "*Forge Reference Material -- curated hit-list from real semantic reviews. See `../methodology/semantic_review.md` for the review protocol.*\n\n# Flagged Vocabulary -- Probabilistically Wide Language\n\n*Words and phrases identified by semantic review (Claude + Codex, bobby7) that widen the probability distribution for AI coding agents. These MUST NOT appear in Decided sections, spec body text, or binding conventions without being qualified by a specific threshold, metric, or concrete definition.*\n\n*This list feeds the graph parser's ambiguity detection. Any new occurrence of these words in a binding context should be flagged for review.*\n\n---\n\n## Subjective Quality Bars\n| Flagged word | Why it's wide | Replace with |\n|-------------|---------------|-------------|\n| adequate | No measurement criteria | Specific threshold or benchmark |\n| appropriate | Who decides? By what criteria? | Named decision-maker or measurable condition |\n| comprehensive | Subjective completeness | \"Covering all N categories\" or enumerate |\n| excellent | No measurable quality gate | Phase number or specific criteria |\n| good | No acceptance standard | Test suite pass rate or specific metric |\n| reasonable | Different agents have different \"reasonable\" | Concrete constraint or enumerated options |\n| sensible | Subjective default quality | Explicit default value with rationale |\n| solid | Qualitative and unbounded | Capability checklist or exit criteria |\n| sufficient | Sufficient for what? | Named requirement it satisfies |\n\n## Undefined Thresholds\n| Flagged word | Why it's wide | Replace with |\n|-------------|---------------|-------------|\n| extremely | No numeric boundary | Specific number (e.g., \"> 32 levels\") |\n| negligible | No measurement | Quantified target (e.g., \"< 1ms\", \"< 1% battery\") |\n| significant | No metric or threshold | Specific comparison metric + threshold |\n| suspicious | No detection criteria | Enumerated conditions or pattern list |\n| anomalous | Anomalous compared to what? | Deviation metric + threshold from baseline |\n| excessive | No upper bound defined | Specific limit |\n\n## Temporal Ambiguity\n| Flagged word | Why it's wide | Replace with |\n|-------------|---------------|-------------|\n| later | When? Which phase? | \"Phase N\" or specific milestone |\n| eventually | Unbounded timeline | Phase number or \"post-V1\" |\n| soon | Relative to what? | Specific trigger or phase |\n| periodically | No cadence defined | Specific interval or trigger condition |\n| rarely | No frequency metric | Specific rate or \"< N per M\" |\n| latest | Time-relative, not pinned | Specific version number or MSRV |\n\n## False Decision Signals\n| Flagged phrase | Why it's wide | Replace with |\n|---------------|---------------|-------------|\n| possible resolution | Reads as decided but isn't | Mark clearly as OQ or move to Decided |\n| Option B or C | Two options, no selection | Pick one and WHY NOT the other |\n| probably | Biases without committing | Commit or defer explicitly |\n| could be | Creates optionality in spec body | \"Is\" or \"will be\" or mark as OQ |\n| may | Optional or planned? | \"Must\" (required) or \"can\" (permitted) |\n| should (in Decided) | Required or recommended? | \"Must\" (required) or \"recommended for users\" |\n\n## Mechanism Words Without Mechanism\n| Flagged word | Why it's wide | Replace with |\n|-------------|---------------|-------------|\n| negotiates | Implies protocol, none specified | Specify the algorithm or selection method |\n| handles | How? | Specific behavior description |\n| manages | What operations? | Enumerate the operations |\n| processes | Through what path? | Name the pipeline stages or function |\n| empirically determined | What evaluation protocol? | Specific benchmark or test criteria |\n\n## Placeholder Language\n| Flagged phrase | Why it's wide | Replace with |\n|---------------|---------------|-------------|\n| N hours | Literal placeholder in binding text | Specific number |\n| N jobs | Literal placeholder | Specific number or configurable with default |\n| etc. | Open-ended list | Enumerate completely or \"including but not limited to [full list]\" |\n| and so forth | Same as etc. | Enumerate |\n| various | Unquantified set | Enumerate or reference a defined list |\n\n## Scope-Widening Qualifiers\n| Flagged word | Why it's wide | Replace with |\n|-------------|---------------|-------------|\n| aggressive | Undefined behavioral mode | Concrete behavior description |\n| conservative | Undefined behavioral mode | Concrete behavior description |\n| heavier scrutiny | Undefined mechanism | Specific actions taken |\n| common | Undefined inclusion criteria | Enumerate or define threshold |\n| optional | Optional for whom? Always or conditionally? | \"Configurable, default X\" or \"available when Y\" |\n\n---\n\n## Usage in Graph Parser\n\nThese words can be detected by the graph parser in binding contexts (Decided sections, conventions, spec body — NOT in Open Questions, WHY blocks explaining rationale, or marketing/vision text). Each hit should be reported as a terminology violation with the category and suggested replacement.\n\nPattern matching should be case-insensitive and exclude:\n- Text inside code blocks\n- Open Questions sections\n- WHY/WHY NOT block rationale text (explaining why a choice was made is fine; prescribing behavior with vague words is not)\n- The glossary and this file itself\n",
"false_positive_exclusion_index": "*Forge Reference Material -- governance methodology for managing false positives from automated validation. See `../methodology/semantic_review.md` and `../tooling/graph/` for the validation tools this supports.*\n\n# False Positive Exclusion Index\n\n*A methodology for managing false positives from automated architecture validation without losing detection capability.*\n\n## The Problem\n\nAutomated validation tools (graph parsers, terminology checkers, ambiguity scanners) produce false positives — flagging correct usage as a violation because the tool lacks contextual understanding. Examples:\n\n- \"Coding agent\" flagged as unqualified \"agent\" when the terminology enforcement table says \"agent\" should be \"calling agent.\" But \"coding agent\" refers to the AI implementing the code, not the downstream AI being protected. Correct usage, wrong flag.\n- \"Systemd service\" flagged as \"service\" should be replaced. But \"service\" here is an OS concept, not the project's product.\n- \"Phase 9\" detected near \"init\" when init was moved to Phase 1 but the Phase 9 text still exists as an annotation.\n\nSuppressing these by weakening the detection rules causes real violations to be missed. Ignoring them makes validation output noisy and unusable. The solution is an exclusion index.\n\n## The Exclusion Index\n\nA single file per project — `exclusion_index.md` — that documents every approved false positive with enough context to audit and self-invalidate.\n\n### Format\n\n```markdown\n# Validation Exclusion Index\n\n| ID | File | Line | Term | Rule | Why Correct | Approved By | Date |\n|----|------|------|------|------|-------------|-------------|------|\n| FP-001 | 01_conventions.md | 40 | agent | terminology:agent | Refers to coding agent implementing code, not downstream agent | LD | 2026-04-03 |\n| FP-002 | 01_conventions.md | 44 | agent | terminology:agent | Same — TONIC entry describing what a coding agent will do wrong | LD | 2026-04-03 |\n| FP-003 | 04_engineering_plan.md | 217 | daemon | terminology:daemon | Refers to systemd daemon concept, not a Bob | LD | 2026-04-03 |\n| FP-004 | domain_07_configuration_system.md | 89 | config file | terminology:config_file | Refers to the TOML operational config, not a policy file | LD | 2026-04-03 |\n```\n\n### Fields\n\n| Field | Purpose |\n|-------|---------|\n| **ID** | Unique identifier. Format: `FP-NNN`. Sequential. Never reused. |\n| **File** | The file containing the flagged term. Relative to the architecture docs directory. |\n| **Line** | The line number at time of exclusion. Used for self-invalidation (see below). |\n| **Term** | The exact word or phrase that was flagged. |\n| **Rule** | Which validation rule produced the flag. Format: `category:specific_rule` (e.g., `terminology:agent`, `phase:conflict`, `ambiguity:should`). |\n| **Why Correct** | Brief explanation of why this usage is correct despite the flag. This is the audit trail — a reviewer can evaluate whether the exclusion is justified. |\n| **Approved By** | Who approved the exclusion. Initials or username. |\n| **Date** | When the exclusion was approved. ISO 8601. |\n\n### Inline References (Optional)\n\nFor traceability, the source document can include an inline reference to the exclusion:\n\n```markdown\nAn agent [FP-001] will reach for tonic because...\n```\n\nThis is optional. The exclusion index works without inline references — the parser matches by file + line + term. Inline references add human-readable traceability at the cost of visual noise in the docs.\n\nFor markdown documents where HTML comments are visible in source:\n\n```markdown\nAn agent <!-- FP-001 --> will reach for tonic because...\n```\n\nChoose the style that fits your project's documentation standards. The parser should recognize both forms.\n\n## Self-Invalidation\n\nThe exclusion index self-validates. On every validation run, the parser checks each exclusion entry:\n\n1. Does the file still exist?\n2. Does the line still contain the flagged term?\n3. Does the validation rule still flag this location?\n\nIf any check fails, the exclusion is marked as **stale**:\n\n- File deleted → exclusion is orphaned. Remove it.\n- Line no longer contains the term → the doc was edited and the term moved or was removed. The exclusion no longer applies at this location. Either update the line number or remove the exclusion.\n- Rule no longer flags this location → the parser was improved or the term was replaced. The exclusion is unnecessary. Remove it.\n\nStale exclusions are reported in validation output:\n\n```\n--- Stale Exclusions ---\nFP-003: 04_engineering_plan.md line 217 no longer contains 'daemon' — STALE\nFP-012: domain_05_crawl_scheduler.md — file was renamed — ORPHANED\n```\n\nThis prevents exclusion rot — the index stays accurate as docs evolve.\n\n## Parser Integration\n\nThe parser reads the exclusion index at the start of each validation run and builds a lookup set:\n\n```python\nexclusions = set()\nfor entry in parse_exclusion_index():\n exclusions.add((entry.file, entry.line, entry.term, entry.rule))\n```\n\nBefore reporting a violation, the parser checks:\n\n```python\nif (file, line, term, rule) in exclusions:\n # Skip — approved false positive\n continue\n```\n\nThe validation summary reports:\n\n```\n--- Exclusion Summary ---\nTotal exclusions: 47\nActive (matched and suppressed): 42\nStale (need review): 5\n```\n\n## When to Exclude vs When to Fix\n\n**Exclude** when the usage is genuinely correct and the parser can't distinguish it from a violation:\n- \"Coding agent\" in a TONIC entry (correct term, wrong context detection)\n- \"Systemd service\" (OS concept, not project terminology)\n- \"Phase 9\" annotated with \"Moved to Phase 1\" (the parser sees both phase numbers)\n\n**Fix the doc** when the usage is actually ambiguous or incorrect:\n- \"The agent processes content\" — which agent? Fix to \"the calling agent\" or \"the Tier 1 rewriter.\"\n- \"This should be done\" — is it required or recommended? Fix to \"must\" or \"recommended.\"\n- \"Later\" without a phase number — fix to \"Phase N\" or \"post-V1.\"\n\n**Fix the parser** when the rule is too broad:\n- If 50+ exclusions exist for the same rule, the rule needs context awareness, not 50 exclusions.\n- Example: if every TONIC entry triggers an \"agent\" violation, add \"skip lines containing 'coding agent'\" to the parser rather than excluding each line individually.\n\n## Governance\n\n- Anyone can propose an exclusion. The exclusion is not active until approved.\n- Approval requires a brief justification (the \"Why Correct\" field). \"Because I said so\" is not sufficient.\n- Exclusions are reviewed during semantic grooming passes. Stale exclusions are removed. Questionable exclusions are re-evaluated.\n- The exclusion index is committed to version control alongside the architecture docs. Changes to the index are visible in git history.\n\n## Metrics\n\nThe exclusion count is a health metric for the validation system:\n\n- **Growing exclusion count with stable doc size** → the parser rules are too broad. Improve the parser.\n- **Shrinking exclusion count** → parser improvements are eliminating false positives. Good.\n- **High stale count** → docs are changing faster than exclusions are maintained. Run a grooming pass.\n- **Zero exclusions** → either the parser is perfect (unlikely) or nobody is running validation (check this).\n"
},
"examples": {
"constitution": "# Project Constitution\n\n> **Forge Document Hierarchy:** **Constitution** > Conventions > Architecture Domain Docs > Engineering Plan\n>\n> This is the highest-authority document. These are immutable architectural laws. If a conflict is discovered between this document and any other, the Constitution wins and the conflicting document must be amended.\n\n*Note: The constitution is derived, not constructed upfront. These articles emerged from design discussions as principles that multiple decisions depend on. They were extracted here when it became clear that violating them would require redesigning multiple subsystems.*\n\n---\n\n### Article 1: Single Source of Truth\n\nThe primary datastore is the sole source of truth for all persistent data. All writes enter through the ingestion pipeline. No component writes directly to secondary stores (search indices, caches, analytics). Secondary stores are derived views, rebuildable from the primary at any time.\n\n**WHY:** Multiple write paths to the same data create consistency nightmares. If Service A writes to the cache directly and Service B writes through the pipeline, the cache and the primary store will diverge. With one write path, every read from any store reflects the same sequence of events.\n\n**WHY NOT:** Allowing direct secondary writes (faster for certain use cases, eliminates pipeline latency for hot data). The latency savings don't justify the consistency risk for this system.\n\n---\n\n### Article 2: Idempotent by Default\n\nAll write operations are idempotent. Every command carries a deduplication key. The receiver checks before applying. Retries are always safe. This applies to APIs, message handlers, and event processors without exception.\n\n**WHY:** Networks are unreliable. Clients retry. Message queues redeliver. Without idempotency, every retry is a potential duplicate that corrupts state.\n\n**WHY NOT:** Skip deduplication for \"obviously safe\" operations. There are no obviously safe operations in a distributed system. Every operation that isn't explicitly idempotent is a ticking time bomb during a network partition.\n\n---\n\n### Article 3: All State Is Persistent\n\nNo in-memory-only state that would be lost on crash. Every job, every status transition, every audit event is written to the storage backend before it is considered to have happened. A crash at any point is recoverable.\n\n**Exception (CLI scan mode):** The CLI scan command in zero-config mode uses ephemeral in-memory storage for the duration of the scan. Findings are output to stdout/file and the database is discarded on exit. This is acceptable because scan mode is a one-shot operation, not a persistent server. If the scan crashes, rerun it.\n\n**Exception (browser/WASM profile):** The WASM build operates stateless by design. It has no filesystem access. The host runtime is responsible for persistence if needed. This is an accepted tradeoff: the WASM profile provides portable processing without persistence guarantees.\n\n**WHY:** Crash recovery without persistent state requires replaying everything from the beginning. With persistent state, recovery resumes from the last committed point.\n\n**WHY NOT:** Keep state in-memory with periodic snapshots (lower latency, simpler code). A crash between snapshots loses work. For this system, the durability guarantee matters more than write latency.\n\n---\n\n### Article 4: Content Sovereignty\n\nUser content never leaves the user's machine unless the user explicitly configures external shipping. The system collects no telemetry, phones home to no server, and transmits no data to its developers.\n\n**WHY:** Trust. Users process sensitive content. If they discover the tool transmits anything, trust is permanently destroyed.\n\n**WHY NOT:** Anonymous usage telemetry (helps prioritize features, costs nothing to the user). \"Anonymous\" telemetry has been de-anonymized too many times. The reputational risk outweighs the product insight.\n\n---\n\n### Article 5: Enforcement Separation\n\nThe monitoring watchdog is non-agentic. No LLM calls, no reasoning, no negotiation. It reads metrics and applies thresholds. A timer and a kill switch. This separation from the intelligent components is the architectural guarantee that a malfunctioning agent cannot cause unbounded damage.\n\n**WHY:** If the safety mechanism shares complexity with the system it monitors, a bug in the system can compromise the safety mechanism. The watchdog's simplicity IS the safety guarantee.\n\n**WHY NOT:** Smart monitoring (the watchdog uses an LLM to evaluate whether a situation is truly dangerous). A watchdog that reasons can be wrong. A watchdog that compares a number to a threshold cannot.\n",
"conventions": "# Conventions\n\n> **Forge Document Hierarchy:** Constitution > **Conventions** > Architecture Domain Docs > Engineering Plan\n\nImplementation choices locked for consistency across AI coding sessions. Conventions change as implementation experience reveals better patterns. The constitution does not.\n\n---\n\n## Language & Stack\n\n- **Backend:** Go 1.22+\n- **Frontend:** TypeScript + React 19\n- **Primary Database:** PostgreSQL 16\n- **Cache:** Redis 7\n- **Message Queue:** NATS JetStream\n\n---\n\n## Canonical Dependencies (with Forbidden Alternatives)\n\n| Purpose | Use | Do NOT Use | WHY NOT |\n|---------|-----|-----------|---------|\n| HTTP router | chi v5 | gin, echo, fiber, gorilla/mux | chi is stdlib-compatible (net/http handlers). Gin uses custom context that doesn't compose with standard middleware. |\n| Database | pgx v5 | database/sql, gorm, sqlx | pgx has native PostgreSQL type support (arrays, JSONB, UUIDs). database/sql requires string scanning. GORM hides the SQL. sqlx is good but pgx is faster for this workload. |\n| UUID | google/uuid (v7) | satori/uuid, rs/xid | google/uuid supports UUIDv7 natively. Satori is unmaintained. xid is not UUID-compatible. |\n| Config | caarlos0/env | viper | Viper pulls 20+ transitive dependencies for format support we don't use. We only need env vars. |\n| Logging | slog (stdlib) | logrus, zap | slog is stdlib since Go 1.21. No external dependency. Structured by default. |\n| Testing | testify | gomega, gocheck | Testify is the ecosystem standard. Consistent assertions across all tests. |\n| Migrations | golang-migrate | goose, atlas | golang-migrate has the simplest CLI and embeds cleanly. |\n\n---\n\n## TONIC Error Prevention\n\n*Technically Obvious, Not Intended Choices. These are the wrong-but-reasonable decisions an AI coding agent will make based on ecosystem defaults.*\n\n| Correct Choice | TONIC Risk | Why We Chose Differently |\n|----------------|------------|--------------------------|\n| chi router with stdlib middleware | gin with custom middleware | chi handlers are `http.Handler`; any stdlib middleware works. Gin middleware requires gin.Context, creating vendor lock-in across the entire middleware stack. |\n| pgx with raw SQL | GORM or ent ORM | We need full control over query plans. ORMs generate SQL that's opaque to EXPLAIN analysis. Every query in this system is hand-tuned. |\n| NATS JetStream | Kafka, RabbitMQ | NATS embeds in a single binary. No ZooKeeper, no broker cluster for a system that runs on a single node in most deployments. |\n| slog with JSON handler | zap with custom encoder | slog is stdlib and good enough. zap is faster but the microsecond difference doesn't matter when database queries take milliseconds. |\n| Server-sent events for real-time | WebSockets | SSE is unidirectional (server to client), which is all we need. WebSockets add bidirectional complexity and connection state management for a capability we don't use. |\n| Vanilla CSS modules | Tailwind, styled-components | CSS modules scope styles without runtime cost. Tailwind's utility classes create merge conflicts in JSX. styled-components add bundle size for styling we can do at build time. |\n\n---\n\n## Error Handling\n\n- Always return errors; never panic in library code\n- Wrap errors with context: `fmt.Errorf(\"creating user: %w\", err)`\n- Use sentinel errors for expected failure modes: `var ErrUserNotFound = errors.New(\"user not found\")`\n- Every error enum has at least 3 domain-specific variants, not a single catch-all\n- At service boundaries: convert internal errors to appropriate HTTP status codes with structured error bodies\n- Never expose stack traces or internal error messages to end users\n\n---\n\n## Naming Conventions\n\n- **Files:** snake_case (`user_service.go`, `order_handler.go`)\n- **Functions/methods:** camelCase per Go convention\n- **Types/structs:** PascalCase per Go convention\n- **Database tables:** snake_case plural (`users`, `order_items`)\n- **Database columns:** snake_case (`created_at`, `user_id`)\n- **API endpoints:** kebab-case (`/api/v1/order-items`)\n- **Environment variables:** SCREAMING_SNAKE (`DATABASE_URL`, `REDIS_HOST`)\n- **Event names:** dot-separated (`order.created`, `user.verified`)\n\n---\n\n## Timestamp Convention\n\n- Internal: `time.Time` in Go, `Date` in TypeScript\n- Wire/storage: RFC 3339 with timezone (`2026-04-03T14:30:00Z`)\n- Field naming: `{event}_at` (e.g., `created_at`, `updated_at`, `verified_at`)\n- Never bare `timestamp`. Always specify what happened.\n- Database: `TIMESTAMPTZ` in PostgreSQL, never `TIMESTAMP`\n\n---\n\n## Contested Idioms\n\n### Go: Error wrapping depth\n**Position:** Wrap at every call site with context. Don't skip wrapping because \"the caller already knows.\"\n\n**WHY:** When an error surfaces in logs, the full wrapping chain tells you the exact call path without needing a stack trace. `creating order: validating items: checking inventory: item not found` is more useful than `item not found` at any depth.\n\n**WHY NOT:** Some teams wrap only at service boundaries to reduce noise. For this project, the diagnostic value of full wrapping outweighs the verbosity.\n\n### Go: Context propagation\n**Position:** Pass `context.Context` as the first parameter to every function that does I/O or could be cancelled. No exceptions.\n\n**WHY:** Cancellation propagation prevents leaked goroutines and abandoned database connections. Without it, a cancelled HTTP request continues executing its entire downstream chain.\n\n**WHY NOT:** It's verbose. Some teams use package-level contexts or skip context for \"fast\" operations. For this project, consistent context propagation is worth the parameter noise.\n\n---\n\n## Terminology Enforcement\n\n| Correct Term | Do NOT Use | WHY |\n|-------------|-----------|-----|\n| Order | Purchase, Transaction, Sale | \"Order\" is the domain concept. \"Transaction\" implies payment only. An order exists before payment. |\n| Fulfillment | Shipping, Delivery, Dispatch | \"Fulfillment\" covers the full lifecycle including digital delivery. \"Shipping\" implies physical goods only. |\n| Credential | Password, Secret, Token (generic) | \"Credential\" is the umbrella term. Use specific subtypes (`api_key`, `jwt`, `session_token`) when referring to a specific type. |\n| Event | Message, Notification, Signal | \"Event\" is the domain concept for async communication. \"Message\" is ambiguous (could mean user-facing). \"Notification\" is reserved for user-facing alerts. |\n\n---\n\n## Community Quality Signals\n\n*These conventions ensure the project presents itself professionally to users and contributors. They're not architectural decisions but they affect how people perceive the system's quality.*\n\n### Log Event Vocabulary\nStable, documented event names that monitoring tools and users can rely on:\n\n| Event | When |\n|-------|------|\n| `startup.complete` | Server is ready to accept requests |\n| `startup.failed` | Server could not start; includes reason |\n| `shutdown.graceful` | Clean shutdown initiated |\n| `shutdown.forced` | Graceful shutdown timed out; forcing exit |\n| `migration.applied` | Database migration completed |\n| `migration.failed` | Migration failed; includes version and error |\n| `health.degraded` | A non-critical dependency is unavailable |\n| `health.recovered` | Previously degraded dependency is back |\n\n### Error Message Quality\n- User-facing errors are human-readable sentences, not codes\n- Include what happened, not just that something failed\n- Include what the user can do about it when possible\n- Never expose internal service names, table names, or query details\n",
"glossary": "# Glossary\n\n> **Forge Document Hierarchy:** Constitution > Conventions > Architecture Domain Docs > Engineering Plan\n>\n> This glossary defines terms as they are used in THIS project. Many terms have different meanings in general usage. The definitions here override general usage for any agent reading these docs.\n\n*Note: The glossary is derived from design discussions, not written upfront. Terms get coined during ideation, used loosely at first, refined through subsequent conversations, and then settled definitions get extracted here. Every entry has a provenance in the conversation history.*\n\n*The glossary narrows probability distributions. When an agent encounters \"Bob\" in the spec, its training data says \"a person's name.\" The glossary collapses that distribution to \"a single running deployment of the system.\" This narrowing compounds across every term, every paragraph, every session.*\n\n---\n\n## Term Definitions\n\n| Term | Definition in This Project | Not to Be Confused With |\n|------|---------------------------|------------------------|\n| **Bob** | A single running deployment of the system. One Bob per machine or container. Named for ease of reference in architecture discussions. \"Spin up a Bob\" = deploy an instance. | A person's name. In this project, \"Bob\" is always a system instance. |\n| **Verdict** | One of four output categories from the content analyzer: `CLEAN`, `SUSPICIOUS`, `REWRITE`, `REJECT`. A verdict is the analyzer's classification of a single input item. | A legal judgment. Here it's strictly a classification enum. |\n| **Tier** | A numbered processing stage in the pipeline. Tier 0 is rule-based pre-filtering. Tier 1 is the primary analysis stage. Tiers execute in sequence; each tier's output feeds the next. | A pricing tier or service level. Tiers here are pipeline stages, not product offerings. |\n| **Contract** | An immutable agreement formed at initialization defining resource limits, permissions, and operating parameters. Once formed, never renegotiated. If circumstances change, terminate and respawn with a new contract. | Design-by-contract (DbC) or legal contracts. Here it's a runtime resource envelope. |\n| **Hot Path** | The request-processing path that runs on every input. Sub-millisecond latency budget. Nothing on the hot path blocks, allocates unboundedly, or calls external services synchronously. | A frequently accessed code path. Here it's a specific architectural boundary with a quantified latency constraint. |\n| **Canary** | A known-bad input injected during startup to verify the detection pipeline is functioning. If the canary passes through undetected, the system refuses to start. | Canary deployments (gradual rollout). Here it's a self-test mechanism. |\n| **Schema Gate** | A hard boundary that rejects output not conforming to the user-defined schema. Enforced by code (deserialization), not by prompting. Passes or rejects, nothing in between. | A database schema migration or validation middleware. The gate is a runtime boundary. |\n| **Experience Library** | A curated collection of input/output examples with scoring rubrics used for training. Each entry has a principle ID, category, input, expected output, and evaluation criteria. Loaded at startup, merged by principle ID when multiple libraries are configured. | A documentation library or knowledge base. This is structured training data, not reference material. |\n| **Outbox Pattern** | When event emission fails (broker unavailable), the event is stored in a local database outbox table. A background process retries until the broker accepts it. Guarantees eventual delivery without blocking the primary operation. | An email outbox. Here it's a durability mechanism for async event publishing. |\n\n---\n\n## Terminology Enforcement\n\n*These mappings prevent vocabulary drift across models and sessions. If a coding agent uses a forbidden synonym, the code will be syntactically incompatible with the rest of the codebase.*\n\n| Correct Term | Forbidden Synonyms | WHY |\n|-------------|-------------------|-----|\n| Bob | instance, node, daemon, service, unit, worker | \"Bob\" is the project's identity for a deployment. Generic terms create confusion when discussing actual OS services, worker threads, or cluster nodes. |\n| Verdict | result, outcome, classification, judgment, response | \"Verdict\" is a 4-value enum. \"Result\" is ambiguous. \"Classification\" implies only the ML part of the process. |\n| Tier | stage, step, phase, level, layer | \"Tier\" = pipeline processing stage. \"Phase\" = implementation phase (Phase 1, Phase 2). \"Layer\" = architectural layer. Mixing them creates confusion. |\n| Schema Gate | validator, checker, filter, sanitizer | \"Schema Gate\" = hard boundary. \"Validator\" implies it might accept with warnings. \"Filter\" implies it removes bad parts. The gate passes or rejects. |\n| Hot Path | fast path, critical path, main path | \"Hot Path\" has a specific latency constraint (sub-ms). \"Critical path\" implies scheduling dependency. \"Fast path\" is relative. |\n",
"domain_doc_01": "# Domain 01: User Service\n\n*Authentication, registration, and profile management.*\n\n---\n\n## Status\n\n| Category | Count |\n|----------|-------|\n| Decided | 8 |\n| Open | 2 |\n\n---\n\n## Architecture\n\n### Overview\n\nThe User Service handles registration, authentication, and profile management. It is the identity authority for the system.\n\n**Cascade: Domains 02, 03**\n\n### Registration Flow\n\n```\nClient → POST /v1/users (email, password)\n → Validate email format and uniqueness\n → Hash password (argon2id)\n → Generate user_id (UUIDv7)\n → Store in PostgreSQL\n → Emit UserCreated event to NATS\n → Return user_id + JWT\n```\n\n### Authentication\n\nJWT tokens with short expiry (15 minutes). Refresh tokens with longer expiry (7 days). Refresh tokens are stored in Redis and revocable.\n\n**WHY short JWT expiry:** A stolen JWT is valid until expiry. 15 minutes limits the damage window. The refresh token is stored server-side and can be revoked immediately.\n\n**WHY NOT session cookies:** The system has multiple clients (web, mobile, API consumers). JWT works across all without server-side session state per request. The refresh token handles the \"stay logged in\" case.\n\n### Password Hashing\n\nArgon2id with: memory 64MB, iterations 3, parallelism 4. `[OQ-1]`\n\n**WHY argon2id, NOT bcrypt:** Argon2id is the current OWASP recommendation. It's resistant to both GPU and ASIC attacks (memory-hard). Bcrypt is memory-cheap and increasingly vulnerable to specialized hardware.\n\n### Event Emission\n\nOn user lifecycle changes (created, updated, deleted, suspended), the service emits events to NATS `users.*` subjects. Other services subscribe to build their own projections.\n\n**WHY events, NOT direct API calls from other services:** Decoupling. The User Service doesn't know or care who consumes user events. Adding a new consumer doesn't require changing the User Service. Constitution Article 1 (API-First) is preserved; events ARE the API.\n\n---\n\n## Decided\n\n1. **Argon2id for password hashing** — Current OWASP recommendation. Memory-hard (resistant to GPU/ASIC). **WHY NOT bcrypt:** memory-cheap, increasingly vulnerable. **WHY NOT scrypt:** less well-analyzed than Argon2id, more complex to tune. (Session 003)\n\n2. **JWT + refresh token model** — Short-lived JWTs (15min) for stateless auth. Server-side refresh tokens (7 days) in Redis for revocation. **WHY NOT session cookies:** multi-client support (web, mobile, API) without per-request server state. (Session 003)\n\n3. **UUIDv7 for user IDs** — Timestamp-sortable, no sequential guessing. **WHY NOT auto-increment:** exposes user count, sequential enumeration attack. **WHY NOT UUIDv4:** not sortable, worse index performance. (Session 001)\n\n4. **Email as unique identifier** — One account per email. No username-based auth. **WHY:** simplifies the identity model. Users forget usernames but not emails. (Session 003)\n\n5. **NATS for event emission** — Lightweight, built-in subject-based routing. **WHY NOT Kafka:** overkill for event notification (we're not doing event sourcing). **WHY NOT Redis Pub/Sub:** no persistence, no replay. NATS JetStream gives us persistence when needed. (Session 004)\n\n6. **Soft delete with `deleted_at`** — Users are never hard-deleted in V1. Soft delete preserves referential integrity. Hard delete is a future admin operation with cascade handling. **WHY NOT hard delete V1:** Foreign key references from other services. Deletion cascade is complex and error-prone. Soft delete is safe; hard delete is a future feature. (Session 005)\n\n7. **Profile updates are PATCH, not PUT** — Partial updates only. Client sends only changed fields. **WHY:** prevents accidental field erasure when client has stale data. (Session 005)\n\n8. **Rate limiting on auth endpoints** — 10 attempts per minute per IP on login. 3 attempts per hour on registration per IP. **WHY these numbers:** balance between security (prevent brute force) and usability (don't lock out legitimate users on shared IPs). (Session 006)\n\n---\n\n## Open Questions\n\n1. **Argon2id parameter tuning** — *Discovery: Phase 1*\n - Current params (64MB memory, 3 iterations, 4 parallelism) are reasonable defaults\n - Need benchmarking on production hardware to verify <500ms hash time\n - May need adjustment based on actual server specs\n\n2. **Multi-factor authentication** — *Post-V1*\n - TOTP? WebAuthn? SMS (please no)?\n - Adds complexity to auth flow\n - Required for enterprise tier\n",
"domain_doc_02": "# Domain 02: Order Service\n\n*Order lifecycle, payment integration, and fulfillment coordination.*\n\n---\n\n## Status\n\n| Category | Count |\n|----------|-------|\n| Decided | 6 |\n| Open | 3 |\n\n---\n\n## Architecture\n\n### Overview\n\nThe Order Service manages the order lifecycle from creation through fulfillment. It subscribes to user events from Domain 01 and coordinates with external payment providers.\n\n**Cascade: Domain 01 (user identity), Domain 03 (inventory)**\n\n### Order State Machine\n\n```\nDRAFT → SUBMITTED → PAYMENT_PENDING → PAID → FULFILLING → COMPLETED\n → PAYMENT_FAILED → DRAFT (retry)\n → CANCELLED\n PAID → REFUND_REQUESTED → REFUNDED\n```\n\nAll state transitions are idempotent (Constitution Article 2). Transitioning to the current state is a no-op.\n\n### Payment Integration\n\nThe Order Service calls the payment provider `[OQ-1]` via their API. Payment is a two-phase operation:\n\n1. **Authorize:** Reserve funds (on SUBMITTED → PAYMENT_PENDING)\n2. **Capture:** Charge funds (on fulfillment confirmation → PAID)\n\n**WHY two-phase, NOT direct charge:** Authorization reserves funds without charging. If fulfillment fails, we release the authorization. The customer is never charged for something they don't receive.\n\n### User Identity Dependency\n\n**Cascade: Domain 01**\n\nThe Order Service subscribes to `users.deleted` events from NATS. When a user is soft-deleted (Domain 01, Decision 6), their pending orders are cancelled. Completed orders are retained for accounting.\n\n**WHY subscribe to events, NOT query the User Service on each order:** Performance and decoupling. The Order Service maintains a local projection of user status. A user deletion doesn't require a synchronous call to the User Service during order processing.\n\n### Idempotency\n\nEvery order mutation carries a `request_id` (UUIDv7). The Order Service tracks completed request_ids for 24 hours (conventions § Idempotency). Duplicate requests return the original result.\n\n**WHY 24 hours, NOT forever:** Bounded storage. A request older than 24 hours is either completed or abandoned. Tracking indefinitely grows unbounded.\n\n**Edge case:** Payment callback arrives twice (network retry from payment provider). The second callback matches the existing `payment_reference_id` and is a no-op. See edge cases EC-2.1.\n\n---\n\n## Decided\n\n1. **Two-phase payment (authorize + capture)** — Reserve on submit, charge on fulfillment. **WHY:** customer never charged for undelivered items. **WHY NOT direct charge:** refund is slower and worse UX than releasing an authorization. (Session 007)\n\n2. **Order state machine with explicit transitions** — No implicit state changes. Every transition is a named function with precondition checks. **WHY:** auditability. Every state change is logged with the transition name, not just the before/after state. (Session 007)\n\n3. **Local user projection via NATS subscription** — Order Service maintains a local cache of user status. **WHY NOT synchronous User Service calls:** eliminates runtime dependency on User Service availability during order processing. (Session 008)\n\n4. **Soft-cancelled orders retained for 90 days** — Cancelled orders are queryable for customer support. Hard-deleted after 90 days. **WHY 90 days:** typical chargeback window is 60-120 days depending on payment provider. 90 days covers most cases. (Session 008)\n\n5. **Order total calculated server-side, never trusted from client** — Client sends item IDs and quantities. Server looks up prices and computes total. **WHY:** price manipulation attack. A client that sends `total: $0.01` must not be trusted. (Session 009)\n\n6. **Monetary values stored as integer cents, not floating point** — `amount_cents: 1999` not `amount: 19.99`. **WHY:** floating point arithmetic produces rounding errors. `0.1 + 0.2 != 0.3` is not acceptable for financial calculations. Integer cents are exact. (Session 009)\n\n---\n\n## Open Questions\n\n1. **Payment provider selection** — *Discovery: Phase 2*\n - Stripe vs Adyen vs Braintree\n - Need to evaluate: transaction fees, international coverage, subscription support\n - Architecture is provider-agnostic (adapter pattern behind PaymentProvider interface)\n\n2. **Partial fulfillment** — *Post-V1*\n - Can an order be partially fulfilled (3 of 5 items shipped)?\n - Requires splitting payment capture per shipment\n - Adds significant state machine complexity\n\n3. **Order history pagination strategy** — *Discovery: Phase 1*\n - Cursor-based vs offset pagination for order listing\n - Cursor-based is better for real-time data but more complex to implement\n - Depends on expected order volume per user\n",
"edge_cases": "# Edge Cases\n\n*Classified by response level. CODE-level cases are implementation exit criteria.*\n\n## Response Levels\n\n| Level | When | Runtime Cost |\n|-------|------|-------------|\n| **CODE** | Common (>1%), predictable, auto-recoverable | On the critical path |\n| **DEGRADE** | Uncommon (<1%), detectable, partially recoverable | Triggered by threshold |\n| **ALERT** | Rare, detectable, needs human intervention | Zero until triggered |\n| **ACCEPT** | Theoretical, impractical to prevent without disproportionate cost | Zero |\n\n---\n\n## EC-1.1: Payment callback arrives after order cancelled\n**Cascade: Domain 02**\n**Response:** CODE. Payment provider sends \"payment successful\" callback, but the order was cancelled 2 seconds before. The Order Service checks current order state before processing the callback. If CANCELLED: log the late callback, release the payment authorization, do NOT reactivate the order. The customer's cancellation is sovereign.\n\n## EC-1.2: User deleted while order is in PAYMENT_PENDING\n**Cascade: Domains 01, 02**\n**Response:** CODE. User soft-delete event arrives from NATS. Order Service receives `users.deleted`. Pending orders for that user transition to CANCELLED. Payment authorization released. If payment callback arrives after cancellation: EC-1.1 applies.\n\n## EC-2.1: Duplicate payment callback (provider retry)\n**Cascade: Domain 02**\n**Response:** CODE. Payment provider retries the callback (network timeout on their end). Second callback carries the same `payment_reference_id`. Order Service checks: is this `payment_reference_id` already processed? If yes: return 200 OK (idempotent). If no: process normally. Constitution Article 2 (Idempotent by Default).\n\n## EC-2.2: Database connection pool exhausted\n**Cascade: Domains 01, 02**\n**Response:** DEGRADE. Connection pool full (all connections in use). New requests get queued with a 5-second timeout. If timeout: return 503 Service Unavailable with Retry-After header. Do NOT crash. Existing requests continue. Log WARNING with pool stats. Alert if sustained >30 seconds. Constitution Article 3 (Fail Open, Log Loud).\n\n## EC-3.1: NATS unavailable during event emission\n**Cascade: Domains 01, 02**\n**Response:** DEGRADE. The operation (user create, order submit) succeeds in the database. Event emission to NATS fails. Store the event in a local outbox table (PostgreSQL). Retry emission on a background timer. Downstream consumers may see a delay but will eventually receive the event. Constitution Article 3: the primary operation succeeds; the secondary notification degrades.\n\n---\n\n## Index\n\n| Domain | Edge Cases |\n|--------|-----------|\n| 01 (Users) | EC-1.2, EC-2.2, EC-3.1 |\n| 02 (Orders) | EC-1.1, EC-1.2, EC-2.1, EC-2.2, EC-3.1 |\n"
}
}