Compact reference for AI coding agents. Every rule includes the problem it solves. For full methodology, see forge_process.md. For the narrative origin story, see ../docs/forge_coding.md.
Forge is specification-driven development for AI-assisted projects. Documentation is the primary artifact. The spec must be precise enough that a cold AI session (no prior context) produces a correct implementation without guessing.
Core doctrine: Any specification gap that forces an AI agent to guess becomes a defect baked into the output.
Generality: Forge applies to any spec-to-output pipeline -- software, policy, legal, compliance. The output type doesn't change the methodology.
When documents conflict, follow this priority:
Constitution (immutable architectural laws)
> Conventions (implementation rules)
> Architecture Domain Docs (domain specifications)
> Engineering Plan (build sequence and phases)
Higher-priority documents override lower.
Problem: Vague language produces wide probability distributions in LLM output. The agent fills ambiguity with training data defaults.
Fix: Use mechanism-focused terms that name the property and the mechanism, not just the outcome.
| Vague (wide distribution) | Precise (narrow distribution) |
|---|---|
| "separate things so they don't affect each other" | "fault containment via process isolation" |
| "it should be fast" | "hot-path latency budget: sub-millisecond" |
| "keep it safe" | "capability-based security at the runtime boundary" |
| "the system handles different sources" | "source-agnostic pipeline" |
Reference: See ../reference/ambiguous_language_dictionary.md for a comprehensive catalog of probabilistically wide language patterns to avoid in binding contexts.
Every architecture domain doc maintains two sections:
Decided -- Numbered entries. Each has: title, description, rationale (WHY block where alternatives exist), session reference.
Open Questions -- Numbered entries with status annotations:
Discovery: Phase N-- will be answered during implementationPost-V1-- explicitly deferredRESOLVED: Decision N-- answered, cross-referenced to the decisionBlocked on: [dependency]-- waiting for another decision
Open questions are not defects. Unmarked ambiguity is.
Problem: An agent reads a paragraph that sounds definitive but contains an unresolved assumption.
Fix: [OQ-N] markers in the spec body link to Open Questions. The marker interrupts assumption formation at the exact point of ambiguity.
Problem: Agents optimize. Without rationale, they "improve" the design by removing constraints they don't understand.
Fix: WHY blocks explain why a decision was made and why alternatives were rejected.
Coverage rule: Required where a reasonable agent might choose differently. Self-evident decisions don't need WHY blocks.
Feedback loop: Agent guesses wrong -> FINDINGS.md captures the guess -> postmortem reviews -> wrong guess becomes a WHY block or TONIC entry -> next agent doesn't guess on that one.
Problem: Technically Obvious, Not Intended Choice. The agent picks the ecosystem default instead of the project's intentional non-default choice.
Fix: State what to use AND what NOT to use, with why. The conventions doc carries a TONIC prevention table:
| Use | Do NOT Use | Why |
|---|---|---|
prost (raw protobuf) |
tonic (gRPC framework) |
No gRPC on this side; UDS framing only |
chi router |
gin |
chi is stdlib-compatible; gin uses custom context |
| Direct API calls | LangChain | Scoped role doesn't need orchestration framework |
| CatBoost | XGBoost | Native categorical feature handling without encoding |
Each entry predicts an exact mistake and pre-corrects it. The TONIC table grows organically: you start with the choices you know will trip agents, cold validation runs surface more, cold code runs surface even more. In practice it ends up with 15-25 entries.
The pattern: whenever (1) a reasonable ecosystem default exists, (2) the project made a specific non-default choice, and (3) the docs don't explicitly exclude the default, an agent will gravitate toward the default. Explicitly forbid it.
Short document (~10-15 articles) of immutable architectural laws. Different from conventions (which tell HOW to write code) and WHY blocks (which explain individual decisions). The constitution tells the agent what the system CANNOT do.
Test: If violating this principle would require redesigning multiple subsystems, it's constitutional.
The constitution is derived, not constructed upfront. It crystallizes when you notice certain principles are load-bearing across multiple domains. You discover your immutable laws; you don't invent them on day one.
When an immutable law has a legitimate exception, document it inline with the article. Don't bury exceptions in separate files.
Locks implementation choices to prevent inconsistency across models and sessions:
- Canonical dependencies with explicit prohibition of alternatives
- Language-specific conventions (naming, error handling, module structure)
- Contested language idioms with WHY blocks for the position taken
- Cross-language conventions (IPC format, timestamps, error taxonomy)
- Terminology enforcement (correct terms mapped to forbidden synonyms)
Artifacts trail the thinking. Domain docs emerge first from design discussions. Conventions solidify as patterns emerge. The glossary locks down as terms settle. The constitution crystallizes last, when load-bearing principles become apparent. You don't define your terms then ideate, and you don't write a constitution then design. You ideate, and the artifacts follow.
Problem: A decision in one domain has implications across others. One stale cross-reference can cause a module to be built against wrong assumptions.
Fix: After every change, propagate across all cross-references. Three levels:
- Level 1: Manual (5 domains) -- identify and update affected domains
- Level 2: Dependency Matrix (10 domains) -- inline cascade tags, structured tracking
- Level 3: Graph (15+ domains) -- Neo4j-backed deterministic dependency tracking (see
../tooling/graph/)
PHASE 1: IDEATE Discuss, explore, capture verbatim. User says "Forge it" to
trigger formalization of the current ideation state.
PHASE 2: FORMALIZE Domain docs, conventions, glossary, constitution, engineering plan.
PHASE 3: STRUCTURAL Graph analysis + dictionary lint. Fix mechanically detectable issues.
(Optional tooling -- see ../tooling/ and ../reference/)
PHASE 4: COLD VALIDATE Fresh session reads docs. Produces disposable plan. Surfaces questions.
Hot session answers + fixes docs. Repeat until questions dry up.
Run again with alternate model.
(See cold_validation_protocol.md)
PHASE 5: PLAN CHECK Evaluate disposable plan against architecture docs. Throw plan away.
PHASE 6: COLD CODE Fresh session builds from spec. Code is disposable validation probe.
Evaluate adherence. Fix docs. Run with alternate model.
(See cold_code_run_protocol.md)
PHASE 7: SIGNAL Convergence across models. No more spec-driven deviations.
PHASE 8: IMPLEMENT Code generation is translation, not design. FINDINGS.md captures
discoveries. Spec drift reconciliation over time.
A cold question is proof of a doc defect. If the cold session asked, the docs failed to communicate. The hot session must answer AND fix the doc.
Hot session resistance. The hot session will claim questions "don't need a doc fix." It's wrong. The hot session has context the docs don't carry. If the cold session asked, the docs are ambiguous.
The disposable plan is a forcing function. Producing a plan forces the model to confront every ambiguity. The questions are the actual output. The plan is a probe.
Open questions don't affect review quality. A tracked OQ with a status is the opposite of ambiguity -- it's the spec being honest about what it doesn't know yet. Some decisions can't be made until implementation.
See cold_validation_protocol.md for the full protocol.
Problem: Structural validation catches relationship problems. It doesn't catch linguistic ambiguity -- "should" in a Decided entry, "handle gracefully" without defining what graceful means.
Fix: A semantic review pass scans binding contexts for probabilistically wide language. Flag and report, don't auto-fix -- context determines whether usage is genuinely ambiguous.
Run before cold validation. Different models flag different patterns.
See semantic_review.md for the protocol. See ../reference/ambiguous_language_dictionary.md for the detection vocabulary.
Complete Phase N before starting Phase N+1.
Every phase has verifiable exit criteria: specific files, passing tests, integration points, scope boundaries.
Every spec ambiguity, contradiction, and guess documented during implementation. The postmortem reviews FINDINGS.md and feeds fixes back into the spec.
When code-driven changes accumulate to a threshold (a release, a milestone, or when the spec and code no longer agree), batch-assess against the spec. The spec was written for machines to read; machines can check it.
After the architecture is substantively complete and cross-domain dependencies are mapped:
- Query interaction surfaces between domains
- For each surface: what goes wrong when timing, resources, state, or data are unexpected?
- Classify each edge case:
| Level | When | Runtime Cost |
|---|---|---|
| CODE | Common (>1%), predictable, auto-recoverable | On the critical path |
| DEGRADE | Uncommon (<1%), detectable, partially recoverable | Triggered by threshold |
| ALERT | Rare, detectable, needs human intervention | Zero until triggered |
| ACCEPT | Theoretical, impractical to prevent | Zero |
CODE-level cases become exit criteria. Don't code around ACCEPT-level cases that add measurable hot-path latency for scenarios occurring less than once per million operations.
- Silent guessing: Document every assumption in FINDINGS.md
- Building all phases at once: Complete Phase N before starting Phase N+1
- Skipping cascade updates: One stale cross-reference can misalign an entire module
- Declaring "good enough" prematurely: The most common failure mode, especially solo
- Blanket lint suppression: Override specific lints with reasons, never suppress all
- Hardcoded strings: Use localization keys
- Delta metrics: Use absolute values
- Bare timestamps: Name the event:
created_at, nottimestamp
Documentation is ready when:
- Cold sessions stop asking basic questions and start asking edge cases
- Cross-model implementations converge on the same architecture
- Divergence points to genuinely open decisions, not spec gaps
- Code artifacts generate without drift from the spec
- Cascading consistency is verified
- TONIC errors are eliminated
- Graph validation passes clean (if applicable)
- Semantic grooming is current (if applicable)
- Edge cases cataloged and classified