diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 0000000..8b1d492 --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1,121 @@ +# Agent Kit — LLM Context + +This repository **is** the Agent Kit framework. It is not a project built with the framework — it is the framework itself. + +## What this repository contains + +Agent Kit is a process framework for LLM-assisted software development. It provides a structured process (lenses, gates, artifacts) and a learning mechanism (domain profiles) that accumulates knowledge across projects. + +When someone copies `framework/` into their own repository and points `AGENTS.md` at `BUILDER.md`, any LLM can follow the process to build software with verification and domain-specific knowledge. + +## Repository structure + +``` +. +├── AGENTS.md # Example entry point (5 lines, points to BUILDER.md) +├── README.md # Explanation: what the framework is and why it exists +├── GUIDE.md # Tutorial: step-by-step first project walkthrough +├── LICENSE # MIT +│ +├── framework/ # THE FRAMEWORK (this is what gets copied to target repos) +│ ├── BUILDER.md # Process contract — the LLM reads and follows this +│ ├── GATEKEEPER.md # Verification agent — executes commands, enforces gates +│ ├── README.md # Technical reference — gates, artifacts, contracts +│ ├── VERSION # Framework version for tracking updates +│ ├── domains/ +│ │ ├── _template.md # Template for creating profile links that extend catalog profiles +│ │ └── README.md # How domain profiles work +│ └── templates/ +│ ├── INTENT.md # Template: what and why +│ ├── DESIGN.md # Template: how (architecture, decisions, risks) +│ ├── VERIFICATION_LOG-template.md # Template: proof (gate output, progress) +│ └── DOMAIN_PROFILE-template.md # Template: standalone full domain profile +│ +├── catalog/ # Community-contributed base profiles +│ ├── README.md # How to use and contribute profiles +│ ├── apps-sdk-mcp-lit-vite.md # Real profile (11 pitfalls, 7 adversary Qs) +│ └── web-kinu-preact-vite.md # Real profile (4 pitfalls, 4 adversary Qs) +│ +├── examples/ # Real project artifacts showing the framework in action +│ ├── mcp-task-widget/ # Complete example with intent, design, verification +│ ├── logistics-control-tower/ # Full-sized project prompt +│ └── test-habit-tracker/ # Spanish-language prompt example +│ +└── docs/ # Would hold generated artifacts in a real project +``` + +## Framework concepts + +- **Domain profiles** accumulate stack knowledge (pitfalls, adversary questions, checks, decisions) across projects +- **Gates** (0-4) are verification checkpoints with real command output — "assumed to pass" is never valid +- **Four lenses**: User, Architecture, Adversary, Domain — thinking modes, not sequential phases +- **Anti-Loop Rule**: produce the Intent document before continuing to investigate +- **Pre-Implementation Checkpoint**: 4 questions before writing any code +- **Resume**: verification log Progress section enables continuing interrupted work + +### The process (BUILDER.md) + +The LLM determines project size (Quick / Standard / Full), then follows a structured process: + +1. **Intent** — Capture what and why before doing anything (`docs/[project]-intent.md`) +2. **Domain profile** — Load accumulated stack knowledge, read every pitfall and adversary question +3. **Skills** — Load relevant skills from `.github/skills/` and `.agents/skills/` as design guidance +4. **Design** — Architecture, decisions, risks, pitfalls applied, adversary questions answered (`docs/[project]-design.md`) +5. **Pre-Implementation Checkpoint** — 4 mental questions before writing code +6. **Gated build** — Gates 0-4 with real command output recorded +7. **Self-review** — Adversary lens + domain checklist +8. **Domain learning** — Update the profile with new discoveries + +### Domain profiles (the differentiator) + +Domain profiles use an **inheritance model**. Base profiles in `catalog/` contain stack-wide knowledge (pitfalls, adversary questions, automated checks, decision history). Profile links in `framework/domains/` extend a base profile with project-specific additions. Every gate failure becomes a new pitfall. Every project makes the next one better. + +Community-contributed base profiles live in `catalog/`. Create a profile link in `framework/domains/` that extends the relevant catalog profile — see `framework/domains/_template.md` for the format. + +A base profile contains: Selection Metadata, Terminology Mapping, Verification Commands, Common Pitfalls, Adversary Questions, Integration Rules, Automated Checks, Decision History, Review Checklist. A profile link contains: `extends` reference, Local Pitfalls, Local Overrides, Local Decision History. + +### Verification gates + +Gates are mandatory checkpoints with real command output. "Assumed to pass" is never valid. Gates 0 (deps) → 1 (scaffold) → 2 (features) → 3 (tests) → 4 (clean build). + +### Artifacts + +- **Intent** — Scope anchor. Given/when/then behaviors, MUST/MUST NOT constraints, IN/OUT scope. +- **Design** — Single document replacing PRD + tech spec + implementation plan. Includes Adversary Questions Applied and Domain Pitfalls Applied as separate mandatory sections. +- **Verification Log** — Gate evidence + Progress section for resuming interrupted work. + +### Anti-Loop Rule + +The LLM produces the Intent before continuing to investigate. Unclear decisions become open questions asked to the human — not reasons to keep researching. + +### Resume mechanism + +Each verification log has a Progress table at the top. When a session is interrupted, the next session reads Progress and continues from the last completed step. + +## Documentation follows Diátaxis + +| Document | Type | Serves | +|----------|------|--------| +| `README.md` | Explanation | Understanding — what and why | +| `GUIDE.md` | Tutorial | Learning — step-by-step first project | +| `framework/README.md` | Reference | Information — specs, contracts, definitions | +| `BUILDER.md` | Reference | Information — the process contract (for LLMs) | +| `GATEKEEPER.md` | Reference | Information — the verification contract (for LLMs) | + +## When modifying the framework + +- `BUILDER.md` is the source of truth for the process. Changes here affect how every LLM behaves. +- Profile link template (`framework/domains/_template.md`) defines what new profile links look like. Full profile template (`framework/templates/DOMAIN_PROFILE-template.md`) defines standalone profiles and catalog contributions. Changes propagate to all future profiles. +- Template changes (`templates/*.md`) affect artifact structure for all future projects. +- `README.md`, `GUIDE.md`, and `framework/README.md` must stay aligned with `BUILDER.md`. If the process changes, the docs must reflect it. +- Examples in `examples/` are historical artifacts — do not modify them to match framework changes. +- Catalog profiles in `catalog/` are contributed by the community — review for quality but preserve the contributor's learnings. + +## Conventions + +- All framework documentation is in English. +- Standalone/base profiles use the structure in `framework/templates/DOMAIN_PROFILE-template.md`. Profile links use `framework/domains/_template.md`. Do not deviate. +- Verification logs are per-project: `docs/[project]-verification.md`, not a shared file. +- Project code goes in its own directory, never at the repo root. +- The AGENTS.md entry point is intentionally minimal (5 lines). Process logic lives in BUILDER.md. +- Skills are guidance, not process. They live in `.github/skills/` (repo-level) or `.agents/skills/` (agent-level/external). Skills cannot override gates, skip artifacts, or replace domain profile correctness. Technical learnings go in domain profiles; process learnings go in skills; project-specific learnings go in `docs/`. diff --git a/AGENTS.md b/AGENTS.md index cd10175..e52b89d 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,5 +1,7 @@ # Agent Instructions -Read and follow `agent-kit/BUILDER.md` for all tasks. +The framework operates on a strict **Adversarial Verification Loop**: +- For design, planning, and code implementation tasks: Read and follow `framework/BUILDER.md`. +- For execution, testing, and mechanical verification tasks: Read and follow `framework/GATEKEEPER.md`. Project artifacts (intent, design, verification) go in `docs/`. diff --git a/GUIDE.md b/GUIDE.md index c7add1a..f90728b 100644 --- a/GUIDE.md +++ b/GUIDE.md @@ -10,12 +10,16 @@ In this tutorial, we will set up Agent Kit in a new repository and use it to bui ## Step 1: Copy the framework into your repo -Copy three things into your repository's root: +Copy the framework into your repository's root: ```bash -cp -R agent-kit/ your-repo/agent-kit/ +cp -R framework/ your-repo/framework/ cp AGENTS.md your-repo/ mkdir -p your-repo/docs + +# Optional: if a catalog profile exists for your stack, copy it too +mkdir -p your-repo/catalog +cp catalog/[your-profile].md your-repo/catalog/ ``` Your repo should now look like this: @@ -23,14 +27,20 @@ Your repo should now look like this: ``` your-repo/ ├── AGENTS.md -├── agent-kit/ +├── framework/ │ ├── BUILDER.md +│ ├── GATEKEEPER.md +│ ├── README.md +│ ├── VERSION │ ├── domains/ │ │ └── _template.md │ └── templates/ │ ├── INTENT.md │ ├── DESIGN.md -│ └── VERIFICATION_LOG-template.md +│ ├── VERIFICATION_LOG-template.md +│ └── DOMAIN_PROFILE-template.md +├── catalog/ ← only if you copied a profile +│ └── [your-profile].md └── docs/ ``` @@ -39,12 +49,14 @@ Open `AGENTS.md` and verify it contains: ```markdown # Agent Instructions -Read and follow `agent-kit/BUILDER.md` for all tasks. +The framework operates on a strict Adversarial Verification Loop: +- For design, planning, and code implementation tasks: Read and follow `framework/BUILDER.md`. +- For execution, testing, and mechanical verification tasks: Read and follow `framework/GATEKEEPER.md`. Project artifacts (intent, design, verification) go in `docs/`. ``` -That's all `AGENTS.md` needs. The process logic lives in `BUILDER.md`. +That's all `AGENTS.md` needs. The Builder handles design and implementation; the GateKeeper handles verification. The process logic lives in their respective files. ## Step 2: Write your prompt @@ -96,15 +108,24 @@ This is your chance to course-correct. If a decision is wrong, say so now. If a ## Step 5: Watch the domain profile load (or get created) -If a domain profile exists for your stack in `agent-kit/domains/`, the LLM loads it and reads every pitfall and adversary question before continuing. +If a matching base profile exists in `catalog/`, the LLM creates a **profile link** in `framework/domains/` with `extends: [profile-id]` and reads every pitfall and adversary question from the base before continuing. The link template is `framework/domains/_template.md`. + +If no base profile exists, the LLM creates a **standalone full profile** directly in `framework/domains/` using `framework/templates/DOMAIN_PROFILE-template.md`. The first version will be minimal — terminology mapping, verification commands, a couple of pitfalls. It will grow as the project discovers new pitfalls. When you want to reuse it across projects, move it to `catalog/` and replace it with a link. + +## Step 6: Skills get loaded (if they exist) -If no profile exists, the LLM creates one from `agent-kit/domains/_template.md`. The first version will be minimal — terminology mapping, verification commands, a couple of pitfalls. That's fine. It will grow. +If your repo has skills installed in `.github/skills/` or `.agents/skills/`, the LLM scans their descriptions and loads any that match the task. For example, a `frontend-design` skill would be loaded for a UI task but ignored for a backend API. -## Step 6: Review the Design +Skills provide design guidance — aesthetic direction, API conventions, documentation style — but they don't replace the domain profile or the verification gates. You might not have any skills yet, and that's fine. The framework works without them. + +> **Tip:** You can install community skills from [skills.sh](https://skills.sh) (`npx skills add owner/skill-name`) or create your own in `.github/skills/your-skill/SKILL.md`. + +## Step 7: Review the Design The LLM creates `docs/[project]-design.md`. This is one document that replaces a separate PRD, tech spec, and implementation plan. You will see: - **Domain Profile Selection** — which profile was chosen and why (with scores) +- **Skills Loaded** — which skills were loaded (or "none") - **Stack** — technologies with verified versions - **Architecture** — structure, data flow, initialization chain - **Decisions** — every architectural choice with rationale @@ -114,7 +135,7 @@ The LLM creates `docs/[project]-design.md`. This is one document that replaces a The Adversary Questions and Domain Pitfalls sections are where the domain profile earns its value. They force the LLM to confront known failure modes *before* writing a single line of code. -## Step 7: Watch the gated build +## Step 8: Watch the gated build Now the LLM starts building. It proceeds through gates: @@ -135,7 +156,7 @@ If a gate fails, the LLM: This last step is the learning cycle in action. A failure today becomes a prevention for tomorrow. -## Step 8: Check the verification log +## Step 9: Check the verification log When the build is complete, open `docs/[project]-verification.md`. At the top you will see the Progress table: @@ -143,6 +164,7 @@ When the build is complete, open `docs/[project]-verification.md`. At the top yo | Step | Status | |------|--------| | Intent | PASS | +| Skills loaded | PASS | | Design | PASS | | Gate 0: Dependencies | PASS | | Gate 1: Scaffold | PASS | @@ -157,18 +179,23 @@ Below that: real output for every gate, a failure history (if anything failed al If Gate 4 passed, the project builds and tests from a clean state. That's the proof. -## Step 9: Check the domain profile +## Step 10: Check what the LLM learned + +After the project, check two places: -Open `agent-kit/domains/[your-profile].md`. Compare it to how it looked before the project. You may see: +**Your profile link** (`framework/domains/[your-profile].md`) — project-specific discoveries: +- New **Local Pitfalls** — things unique to this project's context +- New **Local Decision History** — constraints specific to this project -- New entries in **Common Pitfalls** — things the LLM discovered during implementation +**The base profile** (`catalog/[profile-id].md`) — stack-wide discoveries: +- New entries in **Common Pitfalls** — things any project on this stack should know - New **Adversary Questions** — traps specific to this stack - New **Decision History** entries — constraints learned the hard way - Updated **Automated Checks** — new detection patterns -This is the flywheel. The next project on this stack will load this profile and avoid the problems this project discovered. +This is the flywheel. The next project on this stack inherits the updated base profile automatically through `extends`. Local pitfalls that prove useful across projects should be contributed back to the catalog profile. -## Step 10: Resume interrupted work (when it happens) +## Step 11: Resume interrupted work (when it happens) Sessions get interrupted — context limits, network issues, or just closing the chat. When you come back, start a new session and prompt: @@ -184,8 +211,8 @@ The LLM reads the verification log's Progress section, finds the last completed **Add constraints as you discover preferences.** Every time you say "always do X" or "never do Y", the framework captures it — in the Intent for this project, in the domain profile for all future projects on this stack. -**Bring profiles to new repos.** When you start a new repository with the same stack, copy the domain profile along with `agent-kit/`. All accumulated knowledge travels with it. +**Bring profiles to new repos.** When you start a new repository with the same stack, copy `framework/` and include the relevant base profiles from `catalog/`. Create a new profile link in `framework/domains/` that extends the base. All accumulated stack knowledge travels with it; project-specific knowledge stays behind. -For the full technical reference — file descriptions, gate definitions, domain profile contract, and artifact specs — see [`agent-kit/README.md`](agent-kit/README.md). +For the full technical reference — file descriptions, gate definitions, domain profile contract, and artifact specs — see [`framework/README.md`](framework/README.md). For the concepts behind the framework — why it works, how the learning cycle operates, what makes domain profiles different — see the [project README](README.md). diff --git a/README.md b/README.md index 98e5815..55ad3f9 100644 --- a/README.md +++ b/README.md @@ -6,9 +6,9 @@ A process framework for LLM-assisted software development. Works with any LLM an LLMs make the same mistakes across projects. They skip verification, forget past lessons, lose context between sessions, and build confidently on wrong assumptions. Each new conversation starts from zero — even when the same stack was used yesterday. -Agent Kit solves this with two mechanisms: +Agent Kit solves this with an **Adversarial Verification Loop**: -1. **A structured process** that forces the LLM to pause, verify, and prove its work before moving forward. +1. **Dual-Agent Architecture**: A strictly separated `Builder` (writes code) and `GateKeeper` (executes tests). The LLM is forced to prove its work mechanically against an automated opponent, preventing self-approval hallucinations. 2. **Domain profiles** — living documents that accumulate stack-specific knowledge across projects. Every bug fix, every gate failure, every discovery enriches the profile. The next project on the same stack starts knowing what the last one learned. ## How the process works @@ -30,20 +30,28 @@ graph TB subgraph STANDARD ["Standard / Full Process"] direction TB INTENT["1 Capture Intent
docs/[project]-intent.md"] - LOAD_PROFILE["2 Load domain profile
Read every pitfall &
adversary question
"] - DESIGN["3 Design — all lenses
docs/[project]-design.md"] - CHECKPOINT["Pre-Implementation
Checkpoint
4 questions before coding"] - GATES["4 Gated build
Gate 0 → 1 → 2"] - TESTS["5 Tests + verification
Gate 3 → Gate 4"] - REVIEW["6 Self-review
Adversary Lens +
domain checklist
"] - + LOAD_PROFILE["2 Load domain profile"] + LOAD_SKILLS["3 Load relevant skills
.github/skills/ · .agents/skills/"] + DESIGN["4 Design — all lenses
docs/[project]-design.md"] + WRITE_CODE["5 Implement code
Builder strictly writes"] + HANDOFF["6 Handoff to GateKeeper"] + INTENT --> LOAD_PROFILE - LOAD_PROFILE --> DESIGN - DESIGN --> CHECKPOINT - CHECKPOINT --> GATES + LOAD_PROFILE --> LOAD_SKILLS + LOAD_SKILLS --> DESIGN + DESIGN --> WRITE_CODE + WRITE_CODE --> HANDOFF + end + + %% ── Verification Loop (GateKeeper) ── + subgraph VERIFICATION ["GateKeeper Authority"] + direction TB + GATES["7 Gated build
Gate 0 → 1 → 2"] + TESTS["8 Tests + verification
Gate 3 → Gate 4"] GATES --> TESTS - TESTS --> REVIEW end + + HANDOFF --> VERIFICATION %% ── Domain profile (central) ── DP[("Domain Profile
pitfalls · adversary questions
integration rules · checks
decision history
")] @@ -54,14 +62,15 @@ graph TB %% ── Gate failure loop ── GATES --> FAIL{"Gate
fails?"} - FAIL -->|"Yes"| FIX["Root cause → Fix
→ Re-run gate"] - FIX --> UPDATE_PROFILE["Update domain profile
new pitfall / rule / check"] + FAIL -->|"Yes"| FIX["Raw log to Builder
→ Fix → Re-run"] + FIX --> UPDATE_PROFILE["GateKeeper Updates Profile
new pitfall / rule"] UPDATE_PROFILE --> DP FIX --> GATES FAIL -->|"No"| TESTS %% ── Learning cycle ── - REVIEW --> LEARNING["7 Domain learning
verify profile was updated"] + TESTS --> REVIEW["9 Self-review
Adversary Lens + domain checklist"] + REVIEW --> LEARNING["10 Domain learning
verify profile was updated"] LEARNING --> DP %% ── Verification log ── @@ -88,10 +97,11 @@ graph TB %% ── Styles ── style DP fill:#4CAF50,color:#fff,stroke:#2E7D32,stroke-width:2px style VLOG fill:#2196F3,color:#fff,stroke:#1565C0,stroke-width:2px - style CHECKPOINT fill:#FF9800,color:#fff,stroke:#E65100,stroke-width:2px + style REVIEW fill:#FF9800,color:#fff,stroke:#E65100,stroke-width:2px style FAIL fill:#f44336,color:#fff,stroke:#c62828 style FIX fill:#f44336,color:#fff,stroke:#c62828 style UPDATE_PROFILE fill:#4CAF50,color:#fff,stroke:#2E7D32 + style LOAD_SKILLS fill:#CE93D8,color:#000,stroke:#8E24AA,stroke-width:1px style RESUME fill:#9C27B0,color:#fff,stroke:#6A1B9A style QUICK fill:#78909C,color:#fff,stroke:#37474F style QUICK_DONE fill:#78909C,color:#fff,stroke:#37474F @@ -103,6 +113,7 @@ graph TB - **Blue** = Verification Log — the proof mechanism. Real command output, not assumptions. - **Orange** = Checkpoints — points where the LLM must pause and think before acting. - **Red** = Failure path — failures are captured, root-caused, and fed back into the profile. +- **Light purple** = Skills — optional guidance loaded before design (aesthetic, conventions, workflow). - **Purple** = Resume — interrupted sessions recover from the verification log's Progress section. - The dashed line from Domain Profile back to "Load domain profile" is the **learning cycle**: every project starts with the accumulated knowledge of all previous projects on that stack. @@ -116,7 +127,7 @@ The LLM doesn't follow a linear sequence. It applies four thinking modes wheneve **Adversary Lens** — What could go wrong? Applied *during* design, not only after. "What input breaks this?", "What happens when X is unavailable?", "What would a careless developer get wrong?" If the domain profile has adversary questions, they must be answered against the specific design before any code is written. -**Domain Lens** — What does the domain profile say? Checks every pitfall, follows integration rules, runs automated checks. The accumulated knowledge of previous projects on this stack. +**Domain Lens** — What does the domain profile say? Incorporates integration rules and semantic terminology. The accumulated knowledge of previous projects on this stack. ## Domain profiles: the learning mechanism @@ -136,11 +147,11 @@ From real usage: a domain profile started with 3 pitfalls. After two projects, i Documentation describes how things work. Domain profiles describe how things *fail* — and what to do about it. They are written by the LLM during implementation, not by a human before it. They grow from experience, not from planning. -They also travel. Copy a profile to a new repository and every project in that repo inherits the knowledge. Different LLMs can use the same profile. The learning persists regardless of which model or session created it. +They also travel. Copy `framework/` along with the relevant base profiles from `catalog/` to a new repository and every project in that repo inherits the knowledge. Different LLMs can use the same profile. The learning persists regardless of which model or session created it. ## Verification: proof over claims -LLMs are confident. They will tell you "everything works" when it doesn't. The framework addresses this with gates — mandatory checkpoints where the LLM runs a real command, pastes the real output, and records it in the verification log. +LLMs are confident. They will tell you "everything works" when it doesn't. The framework addresses this with an adversarial loop — mandatory checkpoints where the GateKeeper runs a real command against the Builder's code, pastes the real output, and records it in the verification log. "Assumed to pass" is never valid evidence. If the output isn't in the log, it didn't happen. @@ -164,11 +175,22 @@ The framework is LLM-agnostic. Any model that can read markdown and follow instr See [**GUIDE.md**](GUIDE.md) for a step-by-step tutorial on setting up and using Agent Kit. -See [**agent-kit/README.md**](agent-kit/README.md) for the technical reference — file descriptions, gate definitions, artifact specs, and the domain profile contract. +See [**framework/README.md**](framework/README.md) for the technical reference — file descriptions, gate definitions, artifact specs, and the domain profile contract. ## Included examples -**Domain profile:** [`agent-kit/domains/apps-sdk-mcp-lit-vite.md`](agent-kit/domains/apps-sdk-mcp-lit-vite.md) — A real domain profile built across multiple projects with [Apps SDK](https://developers.openai.com/apps-sdk/quickstart) + Lit + Vite. 11 pitfalls, 7 adversary questions, automated checks, and decision history — all learned from real bugs. Shows what a mature profile looks like after the flywheel has turned a few times. +**Domain profiles:** The [`catalog/`](catalog/) directory contains community-contributed domain profiles built from real projects: + +- [`apps-sdk-mcp-lit-vite.md`](catalog/apps-sdk-mcp-lit-vite.md) — MCP Apps + Lit + Vite. 11 pitfalls, 7 adversary questions. Shows what a mature profile looks like after the flywheel has turned. +- [`web-kinu-preact-vite.md`](catalog/web-kinu-preact-vite.md) — Kinu + Preact + Vite. 4 pitfalls, 4 adversary questions. + +If a relevant catalog profile exists for your stack, create a profile link in `framework/domains/` that extends it (see `framework/domains/_template.md`). If not, the Builder will create a standalone profile during the first project using `framework/templates/DOMAIN_PROFILE-template.md`. + +**Project artifacts:** The [`examples/`](examples/) directory contains real project artifacts showing the framework in action: + +- [`mcp-task-widget/`](examples/mcp-task-widget/) — Complete example with intent, design, and verification artifacts. +- [`logistics-control-tower/`](examples/logistics-control-tower/) — Full-sized project prompt. +- [`test-habit-tracker/`](examples/test-habit-tracker/) — Spanish-language prompt example. ## License diff --git a/agent-kit/domains/README.md b/agent-kit/domains/README.md deleted file mode 100644 index a57ad1d..0000000 --- a/agent-kit/domains/README.md +++ /dev/null @@ -1,67 +0,0 @@ -# Domain Profiles - -Domain profiles provide stack-specific knowledge that the Builder loads based on the project's technology stack. They are the framework's learning mechanism — each project can enrich them with new pitfalls, decisions, and automated checks. - -## How Profiles Are Used - -1. The Builder reads the user's prompt and identifies the technology stack -2. The Builder applies the operational matching contract in this directory -3. If found, the profile overrides generic assumptions: - - **Verification commands** replace generic "run build" instructions - - **Common pitfalls** are checked against the design before coding - - **Adversary questions** must be answered against the specific design (documented in the Design doc) - - **Automated checks** are executed mechanically during self-review - - **Decision history** applies permanent constraints from past projects - - **Integration rules** define data flow between technologies - -## Operational Matching Contract - -Selection must be deterministic and auditable: - -1. Candidate set: all `.md` files in this directory except `_template.md` and `README.md` -2. Metadata required per profile: - - `Profile ID` - - `Match Keywords` - - `Use When` - - `Do Not Use When` -3. Remove any profile whose `Do Not Use When` matches explicit user constraints -4. Score remaining profiles by keyword overlap with prompt/stack (`+1` per keyword hit) -5. Select only if highest score is unique and `>= 2` -6. If tied or below threshold: ask the human to choose; if no clarification is available, create a new profile from `_template.md` instead of forcing a weak match -7. Record the selected profile and reason in the project design doc - -## Naming Convention - -`[domain]-[stack].md` - -The filenames below are naming examples, not bundled profiles in this starter pack. - -Examples: -- `web-angular-lit.md` -- `web-react-nextjs.md` -- `plc-siemens-scl.md` -- `embedded-stm32-freertos.md` -- `backend-python-fastapi.md` - -## Who Creates Profiles - -1. **The human (proactively)** — Create profiles for stacks you use frequently. Use `_template.md`. -2. **The Builder (during a project)** — When no profile exists for the stack, the Builder creates one during the Design phase. Pitfalls discovered during implementation enrich it. - -## Profile Sections - -| Section | Purpose | -|---------|---------| -| Selection Metadata | Deterministic profile routing (`Profile ID`, keywords, use/do-not-use rules) | -| Terminology Mapping | Translates generic "build/test/deploy" to stack language | -| Verification Commands | Exact commands for each gate (0-4) | -| Common Pitfalls | Frequent errors with detection patterns | -| Adversary Questions | Domain-specific questions to answer BEFORE coding — born from real bugs. Must be answered in the Design doc's "Adversary Questions Applied" section | -| Integration Rules | Data flow between technologies, build scoping, init order | -| Automated Checks | Executable commands for mechanical verification | -| Decision History | Permanent constraints from real project experience | -| Review Checklist | Items to verify during self-review | - -## When No Profile Exists - -The Builder creates one using `_template.md` during the Design phase. Minimum viable profile: Terminology Mapping + Verification Commands + 2 Common Pitfalls. diff --git a/catalog/README.md b/catalog/README.md new file mode 100644 index 0000000..66e887f --- /dev/null +++ b/catalog/README.md @@ -0,0 +1,62 @@ +# Domain Profile Catalog + +Community-contributed domain profiles for Agent Kit. Each profile captures stack-specific knowledge — pitfalls, adversary questions, verification commands, and decision history — learned from real projects. + +## Available Profiles + +| Profile | Stack | Pitfalls | Adversary Qs | +|---------|-------|----------|--------------| +| [apps-sdk-mcp-lit-vite](apps-sdk-mcp-lit-vite.md) | MCP Apps + Lit + Vite + TypeScript | 11 | 7 | +| [web-kinu-preact-vite](web-kinu-preact-vite.md) | Kinu + Preact + Vite + TypeScript | 4 | 4 | + +## Using a profile + +Create a profile link in your project's `framework/domains/` directory that extends the catalog profile: + +```markdown + + +## Profile Link + +extends: web-kinu-preact-vite +catalog_version: 1.0.0 + +## Local Pitfalls + + +## Local Overrides + +``` + +The Builder loads the base from `catalog/` and merges your local additions on top. See `framework/domains/_template.md` for the full template and `framework/domains/README.md` for merge rules. + +## Contributing a profile + +Domain profiles grow from real project experience. If you've built projects with a stack that isn't represented here, consider contributing your profile. + +### Requirements + +1. Use the template at `framework/templates/DOMAIN_PROFILE-template.md` +2. Follow the naming convention: `[domain]-[stack].md` (e.g., `web-react-nextjs.md`, `backend-python-fastapi.md`) +3. Include at minimum: + - **Selection Metadata** — Profile ID, Match Keywords, Use When, Do Not Use When + - **Terminology Mapping** — Stack-specific command translations + - **Verification Commands** — Exact commands for Gates 0-4 + - **2+ Common Pitfalls** — With What/Correct/Detection pattern +4. Every pitfall and adversary question should come from a real bug or failure — not hypotheticals + +### What makes a good profile + +- **Specific:** "Vite uses `--k-` CSS variable prefix, not `--p-`" beats "check your CSS variables" +- **Actionable:** Each pitfall has a Detection command that can be run mechanically +- **Honest:** If a pitfall was discovered by making the mistake, say so +- **Growing:** A profile with 3 real pitfalls is more valuable than one with 10 speculative ones + +### Submitting + +1. Fork the repository +2. Add your profile to `catalog/` +3. Open a pull request with: + - The stack and domain your profile covers + - How many projects informed the profile + - At least one example of a pitfall that prevented a real bug diff --git a/agent-kit/domains/apps-sdk-mcp-lit-vite.md b/catalog/apps-sdk-mcp-lit-vite.md similarity index 100% rename from agent-kit/domains/apps-sdk-mcp-lit-vite.md rename to catalog/apps-sdk-mcp-lit-vite.md diff --git a/examples/PASO-A-PASO.md b/examples/PASO-A-PASO.md new file mode 100644 index 0000000..e2ddfa5 --- /dev/null +++ b/examples/PASO-A-PASO.md @@ -0,0 +1,412 @@ +# Paso a paso: qué ocurre cuando pegas un prompt + +Este documento describe la secuencia completa desde que escribes un prompt hasta que el proyecto está terminado y verificado. Cubre el flujo **Standard/Full** (el más completo). El flujo Quick es una versión reducida que salta directamente a código + Gate 2. + +--- + +## Diagrama general + +```mermaid +graph TB + %% ── Arranque ── + PROMPT["👤 Usuario pega el prompt
'Read AGENTS.md. Build...'"] + READ_AGENTS["🤖 Lee AGENTS.md"] + READ_BUILDER["🤖 Lee BUILDER.md + GATEKEEPER.md"] + PROMPT --> READ_AGENTS --> READ_BUILDER + + %% ── Clasificación ── + READ_BUILDER --> SIZE{"¿Qué tamaño
tiene la tarea?"} + SIZE -->|"< 3 archivos,
alcance claro"| QUICK["⚡ Quick
Código → Gate 2 → Fin"] + SIZE -->|"feature"| STD["📋 Standard"] + SIZE -->|"proyecto nuevo /
refactor mayor"| FULL["📋 Full
(Standard + ADRs +
Devil's Advocate)"] + + %% ── Escalado Quick ── + QUICK --> Q_ESC{"¿Tocó > 3 archivos
o bugs nuevos?"} + Q_ESC -->|"Sí"| STD + Q_ESC -->|"No"| DONE["✅ Hecho"] + + %% ── Flujo Standard/Full ── + subgraph PROCESS ["Proceso Standard / Full"] + direction TB + + subgraph PHASE_1 ["FASE 1 · Entender + Preparar"] + direction TB + INTENT["1️⃣ Capturar Intent
docs/[proyecto]-intent.md
Qué · Por qué · Comportamientos
Decisiones · Restricciones"] + LOAD["2️⃣ Cargar Domain Profile
Resolver extends → catalog/
Merge: base + local"] + SKILLS["3️⃣ Cargar Skills relevantes
.github/skills/ · .agents/skills/"] + READ_PROFILE["4️⃣ Leer CADA pitfall
y adversary question
del perfil mergeado"] + INTENT --> LOAD --> SKILLS --> READ_PROFILE + end + + subgraph PHASE_2 ["FASE 2 · Diseñar"] + direction TB + DESIGN["5️⃣ Design — 4 lentes
docs/[proyecto]-design.md"] + LENSES["User · Architecture
Adversary · Domain"] + AQ["6️⃣ Adversary Questions Applied
Responder cada pregunta
del perfil contra ESTE diseño
"] + DP["7️⃣ Domain Pitfalls Applied
Cómo se aborda cada
pitfall conocido
"] + CHECKPOINT["8️⃣ Pre-Implementation Checkpoint
4 preguntas antes de codificar"] + DESIGN --- LENSES + DESIGN --> AQ --> DP --> CHECKPOINT + end + + subgraph PHASE_3 ["FASE 3 · Construir + Verificar"] + direction TB + CODE["9️⃣ Builder implementa código"] + HANDOFF["🔟 Handoff al GateKeeper
'Gate X listo para verificar'"] + CODE --> HANDOFF + + subgraph GATES ["GateKeeper ejecuta gates"] + direction TB + G0["Gate 0 · Dependencias
npm install → exit 0"] + G1["Gate 1 · Scaffold
build → artefactos existen"] + G2["Gate 2 · Feature
build + tests → sin regresiones"] + G3["Gate 3 · Tests
suite completa → todo pasa"] + G4["Gate 4 · Clean Build
desde cero → todo pasa"] + G0 --> G1 --> G2 --> G3 --> G4 + end + + HANDOFF --> GATES + end + + subgraph PHASE_4 ["FASE 4 · Cerrar"] + direction TB + REVIEW["1️⃣1️⃣ Self-Review
Adversary Lens sobre
el código terminado
"] + LEARNING["1️⃣2️⃣ Domain Learning
Actualizar perfil con
descubrimientos
"] + REVIEW --> LEARNING + end + + PHASE_1 --> PHASE_2 --> PHASE_3 --> PHASE_4 + end + + STD --> PROCESS + FULL --> PROCESS + + %% ── Bucle de fallo ── + FAIL{"❌ Gate falla"} + G0 --> FAIL + G1 --> FAIL + G2 --> FAIL + G3 --> FAIL + FAIL -->|"Log al Builder"| FIX["Root cause → Fix"] + FIX -->|"Reenvía al
GateKeeper"| GATES + FIX -->|"Nuevo pitfall"| UPDATE_DP["Actualizar
Domain Profile"] + + %% ── Artefactos ── + VLOG["📄 Verification Log
docs/[proyecto]-verification.md
Progreso · Output real · Fallos"] + GATES -. "output real" .-> VLOG + FAIL -. "historial de fallos" .-> VLOG + + %% ── Flywheel ── + DP_NODE[("🧠 Domain Profile
catalog/ (base)
domains/ (local)
")] + LEARNING --> DP_NODE + UPDATE_DP --> DP_NODE + DP_NODE -. "próximo proyecto
arranca sabiendo
todo esto" .-> LOAD + + %% ── Estilos ── + style DP_NODE fill:#4CAF50,color:#fff,stroke:#2E7D32,stroke-width:2px + style VLOG fill:#2196F3,color:#fff,stroke:#1565C0,stroke-width:2px + style FAIL fill:#f44336,color:#fff,stroke:#c62828 + style FIX fill:#f44336,color:#fff,stroke:#c62828 + style UPDATE_DP fill:#4CAF50,color:#fff,stroke:#2E7D32 + style SKILLS fill:#CE93D8,color:#000,stroke:#8E24AA,stroke-width:1px + style DONE fill:#4CAF50,color:#fff,stroke:#2E7D32 + style CHECKPOINT fill:#FF9800,color:#fff,stroke:#E65100 +``` + +--- + +## Paso a paso detallado + +### ARRANQUE — El prompt activa la cadena + +**Paso 0: El usuario pega el prompt** + +``` +Read AGENTS.md. Build a task management widget. +Stack: Lit 3, Vite, TypeScript, MCP SDK. +``` + +La línea `Read AGENTS.md` es el disparador. Sin ella, el LLM no sabe que existe el framework. + +**Paso 0.1: El LLM lee AGENTS.md** + +AGENTS.md contiene solo 5 líneas. Le dice al LLM: +- Para diseñar e implementar → lee `framework/BUILDER.md` +- Para verificar → lee `framework/GATEKEEPER.md` +- Los artefactos van en `docs/` + +**Paso 0.2: El LLM lee BUILDER.md** + +BUILDER.md es el contrato de proceso completo. El LLM absorbe: +- Las 4 lentes (User, Architecture, Adversary, Domain) +- Las 3 escalas (Quick, Standard, Full) +- El protocolo de gates +- Las reglas de domain profiles (modelo de herencia) +- La Anti-Loop Rule + +**Paso 0.3: Clasificación por tamaño** + +El LLM evalúa el prompt: + +| Tamaño | Criterio | Qué produce | +|--------|----------|-------------| +| **Quick** | < 3 archivos, alcance obvio | Código + Gate 2 | +| **Standard** | Feature. Hay decisiones de diseño | Intent + Design + Código + Verification Log | +| **Full** | Proyecto nuevo o refactor mayor | Todo de Standard + ADRs + Devil's Advocate | + +> Regla: en caso de duda, sube un nivel. Es más barato documentar de más que descubrir tarde que faltaba una decisión. + +--- + +### FASE 1 — Entender + +**Paso 1: Capturar el Intent** + +El LLM crea `docs/[proyecto]-intent.md` ANTES de investigar más (Anti-Loop Rule). Extrae del prompt: + +- **Goal** — Qué y por qué (1-2 frases) +- **Behavior** — Comportamientos observables en formato given/when/then +- **Decisions** — Decisiones tomadas con alternativas rechazadas +- **Constraints** — MUST / MUST NOT / SHOULD +- **Scope** — Qué entra (IN) y qué queda fuera (OUT) +- **Acceptance** — Condiciones verificables de "terminado" + +Si algo no está claro, lo escribe como pregunta abierta y **pregunta al humano**. No asume. + +> Este es tu momento de corregir rumbo. Si una decisión es incorrecta, dilo ahora. + +**Paso 2: Cargar el Domain Profile (modelo de herencia)** + +El LLM busca en `framework/domains/`: + +1. Encuentra un archivo (ej. `mi-proyecto.md`) con un campo `extends: apps-sdk-mcp-lit-vite` +2. Carga el perfil base desde `catalog/apps-sdk-mcp-lit-vite.md` +3. Aplica las reglas de merge: + - **Local Pitfalls** y **Local Adversary Questions** → se **añaden** a las del base + - **Local Overrides** → **reemplazan** la sección correspondiente del base + - **Local Decision History** → se mantiene separado + - Todo lo demás → se hereda del base sin cambios + +Si no existe ningún perfil → el LLM crea uno nuevo completo en `catalog/` y luego un link en `domains/`. + +**Paso 3: Cargar Skills relevantes** + +El LLM busca en `.github/skills/` y `.agents/skills/` archivos `SKILL.md`. Lee la `description` de cada uno y carga los que encajan con la tarea. Por ejemplo, una skill `frontend-design` se carga para tareas de UI pero se ignora para una API backend. + +Las skills son guidance de diseño y calidad — no reemplazan el proceso del framework ni los domain profiles. Si no hay skills instaladas, se salta este paso. + +> En proyectos **Full**, este paso es obligatorio. Si no hay skills, se documenta "ninguna encontrada". + +**Paso 4: Leer cada pitfall y adversary question** + +No basta con "cargar" el perfil. El LLM lee activamente cada pitfall y cada adversary question del perfil mergeado. Estos informan el diseño — cargar sin leer es inútil. + +--- + +### FASE 2 — Diseñar + +**Paso 5: Design Document — las 4 lentes en un solo pase** + +El LLM crea `docs/[proyecto]-design.md` aplicando las 4 lentes simultáneamente: + +| Lente | Pregunta central | Qué produce | +|-------|-----------------|-------------| +| **User** | ¿Qué debe hacer el software? | Goals, behaviors, needs implícitas | +| **Architecture** | ¿Cómo se construye? | Stack, estructura, data flow, init chain, dependencias | +| **Adversary** | ¿Qué puede salir mal? | Risks, edge cases, assumptions cuestionables | +| **Domain** | ¿Qué dice el perfil? | Pitfalls aplicados, integration rules, terminología | + +El Design incluye: +- **Domain Profile Selection Rationale** — Por qué se eligió este perfil (con scores) +- **Skills Loaded** — Qué skills se cargaron (o "ninguna") +- **Stack** — Tecnologías con versiones verificadas (`npm view`) +- **Architecture** — Estructura, data flow, init chain +- **Decisions** — Cada decisión arquitectónica con alternativas rechazadas +- **Risks** — Identificados ANTES de implementar + +**Paso 6: Adversary Questions Applied** (sección obligatoria) + +Cada adversary question del perfil se responde contra ESTE diseño específico: + +| Pregunta del perfil | Respuesta para este diseño | +|---------------------|---------------------------| +| "¿Qué pasa si `document.referrer` está vacío?" | "No aplica — usamos SDK `App` class que usa `postMessage`" | + +> Checking pitfalls ≠ answering adversary questions. Son secciones separadas con propósitos distintos. + +**Paso 7: Domain Pitfalls Applied** (sección obligatoria) + +Para cada pitfall del perfil: ¿aplica? ¿cómo se aborda? + +| Pitfall | ¿Aplica? | Cómo se aborda | +|---------|----------|----------------| +| Attribute binding for non-strings | Sí | Property binding en todos los casos | +| CORS en widget assets | Sí | CSP configurado en `_meta.ui` | + +**Paso 8: Pre-Implementation Checkpoint** + +4 preguntas antes de escribir una sola línea de código: + +1. **¿Mis dependencias ya resuelven esto?** — Lee la API pública. Si la librería lo ofrece, úsala. +2. **¿Qué assumption del entorno podría estar mal?** — Identifica al menos una. +3. **¿He revisado los pitfalls del perfil?** — Cada pitfall y adversary question contra el plan. +4. **¿Sigue siendo el tamaño correcto?** — Re-evalúa Quick vs Standard vs Full. + +> No es un documento. Es una pausa mental — existe porque los LLMs bajo presión se la saltan. + +--- + +### FASE 3 — Construir y verificar + +**Paso 9: El Builder implementa código** + +El Builder escribe código siguiendo el Design. El código va en su propio directorio (ej. `mcp-task-widget/`), nunca en la raíz del repo. + +**Paso 10: Handoff adversarial al GateKeeper** + +El Builder **nunca verifica su propio trabajo**. Cuando una fase está lista: + +1. Declara: *"Gate X listo para verificar"* +2. **Para.** No ejecuta comandos de verificación. +3. El GateKeeper toma el control. + +> Esta separación elimina las auto-aprobaciones alucinadas. + +**Paso 10.1: El GateKeeper ejecuta los gates** + +El GateKeeper es un agente estrictamente mecánico: + +| Gate | Qué verifica | Criterio de paso | +|------|-------------|-----------------| +| **0** | Dependencias | `npm install` → exit 0, sin warnings de dependencias | +| **1** | Scaffold | Build → exit 0, artefactos existen en disco | +| **2** | Feature | Build + tests → exit 0, sin regresiones | +| **3** | Tests | Suite completa → todo pasa, coverage cumple target | +| **4** | Clean Build | Desde cero (borrar todo → install → build → test) → todo pasa | + +Para cada gate, el GateKeeper: +1. Lee los comandos del Domain Profile +2. Los ejecuta en terminal +3. Captura STDOUT/STDERR + exit code +4. Pega el output real en `docs/[proyecto]-verification.md` + +> "Assumed to pass" nunca es evidencia válida. Output real o no ocurrió. + +**Paso 10.2: Qué pasa cuando un gate falla** + +``` +[GATE REJECTED] +Gate: 2 +Exit Code: 1 +Raw Output: + +Action Required: Builder must analyze the root cause and resubmit. +``` + +1. El GateKeeper reporta el fallo con el log crudo +2. El Builder analiza la **causa raíz** (no el síntoma) +3. El Builder corrige el código +4. El Builder reenvía al GateKeeper +5. El GateKeeper re-ejecuta el gate desde cero +6. Si el fallo reveló un gap → **el GateKeeper actualiza el Domain Profile** con un nuevo pitfall + +> Cada fallo es una oportunidad de aprendizaje. Un fix sin actualización del perfil significa que el mismo error puede repetirse en el próximo proyecto. + +**Paso 10.3: Qué pasa cuando un gate pasa** + +``` +[GATE PASSED] +The verification log has been updated with the mechanical proof. +``` + +El GateKeeper actualiza la tabla de Progress y avanza al siguiente gate. + +--- + +### FASE 4 — Cerrar + +**Paso 11: Self-Review** + +El Builder cambia a lente Adversary sobre el código terminado: + +1. Re-lee el Intent. ¿El código cumple cada Behavior? ¿Respeta cada Constraint? +2. Ejecuta cada Automated Check del perfil +3. Revisa cada Common Pitfall contra el código +4. Verifica cada item del Review Checklist + +Para proyectos **Full**, añade: +- **Devil's Advocate** — 3 escenarios no cubiertos, eslabón más débil, vector de ataque +- **Findings** — Mínimo 3 hallazgos genuinos + +**Paso 12: Domain Learning** + +El Builder verifica que el Domain Profile se actualizó durante la implementación. Si algo se pasó por alto, lo actualiza ahora: + +| Dónde va | Qué tipo de descubrimiento | +|----------|---------------------------| +| **catalog/** (base) | Pitfalls de stack, adversary questions, integration rules, decision history | +| **domains/** (local) | Pitfalls únicos de este proyecto, overrides de gates, decisiones locales | + +--- + +### El resultado final + +Al terminar, tienes: + +``` +docs/ +├── [proyecto]-intent.md ← Qué y por qué +├── [proyecto]-design.md ← Cómo (arquitectura, decisiones, riesgos) +└── [proyecto]-verification.md ← Prueba (output real de cada gate) + +catalog/ +└── [perfil].md ← Actualizado con lo aprendido (stack-wide) + +framework/domains/ +└── [proyecto].md ← Actualizado con lo aprendido (project-specific) +``` + +Y la tabla de Progress del verification log muestra: + +``` +| Step | Status | +|------------------------|--------| +| Intent | PASS | +| Skills loaded | PASS | +| Design | PASS | +| Gate 0: Dependencies | PASS | +| Gate 1: Scaffold | PASS | +| Gate 2: Feature | PASS | +| Gate 3: Tests | PASS | +| Gate 4: Clean build | PASS | +| Self-Review | PASS | +| Domain update | PASS | +``` + +--- + +### Si la sesión se interrumpe + +El LLM de la siguiente sesión: + +1. Lee `AGENTS.md` → `BUILDER.md` +2. Busca `docs/[proyecto]-verification.md` +3. Lee la sección **Progress** — encuentra el último paso completado +4. Carga el domain profile del design doc +5. Continúa desde el siguiente paso incompleto + +No repite trabajo. No adivina. + +--- + +### El flywheel + +El Domain Profile es el artefacto más valioso. Después del primer proyecto: + +``` +Proyecto 1: 3 pitfalls → perfil crece a 11 pitfalls, 7 adversary questions +Proyecto 2: carga ese perfil → significativamente menos fallos de gate +Proyecto 3: carga el perfil enriquecido → aún más suave +``` + +Cada proyecto alimenta al siguiente. El conocimiento no se pierde entre sesiones, entre proyectos, ni entre LLMs diferentes. diff --git a/examples/logistics-control-tower/PROMPT.md b/examples/logistics-control-tower/PROMPT.md new file mode 100644 index 0000000..a31ddd8 --- /dev/null +++ b/examples/logistics-control-tower/PROMPT.md @@ -0,0 +1,105 @@ +# Example: Global Logistics Control Tower + +## Context + +This example reverse-engineers an existing project built with an earlier version of Agent Kit. The original was a multi-tenant cold-chain monitoring system with a Lit dashboard, risk engine, incident management, and audit trail. + +The prompt below is designed for the current version of the framework. It describes the same functional requirements but pushes for a bolder visual direction — moving from the original's corporate blue/teal palette to something more distinctive. + +No domain profile exists for this stack yet. The LLM will create one from `_template.md` during the Design phase. + +## The prompt + +``` +Read AGENTS.md. Build a Global Logistics Control Tower — a multi-tenant, +event-driven backend for real-time monitoring of temperature-sensitive +shipments (cold-chain logistics). + +Stack: Node.js (>=22), TypeScript, Lit 3 (CDN via esm.sh), node:http, +node:test. No external frameworks (no Express, no Vitest, no React). + +Core domain: +- Shipments with lifecycle: planned → in_transit → at_risk → incident → delivered +- IoT telemetry ingest (temperature, humidity, door open, battery, location) +- Risk engine: score 0-100 based on sensor thresholds + delivery delay +- Automatic incident opening when risk score crosses threshold (default: 70) +- Incident resolution with notes and audit trail +- Multi-tenant isolation on every read/write path +- Idempotent ingest (dedup by event ID and idempotency key) +- Out-of-order safe (timeline sorted by sensor timestamp, not arrival) +- Immutable audit trail for every lifecycle action +- Structured error envelope on all API responses (code, message, traceId) + +REST API: +- POST /api/shipments (create, idempotent) +- POST /api/telemetry (ingest, triggers risk recalc + status transition) +- POST /api/incidents/resolve +- GET /api/dashboard/{tenantId} (shipments, open incidents, KPIs) +- GET /api/audit/{tenantId} +- GET /api/alerts/{tenantId} (in-app + email outbox) +- GET /api/metrics (ingestCount, duplicateCount, p95 latency) + +Dashboard KPIs: on-time rate %, cold-chain breaches count, MTTR minutes. + +Frontend: Lit custom element served as static HTML. +Responsive 12-column CSS grid. Forms for creating shipments and ingesting +telemetry. Live lists for shipments, incidents, audit trail, and alerts. +Auto-refresh after every action. + +Visual direction: I want something visually bold and modern — not the +typical corporate dashboard with safe blues and grays. Think dark mode +with high-contrast accent colors, glassmorphism cards, gradient mesh +backgrounds, and strong typographic hierarchy. The UI should feel like +a mission control center, not a spreadsheet. Surprise me with the +palette but keep it readable and professional. + +Storage: in-memory maps (tenant-scoped). Include a PostgreSQL schema +artifact in db/schema.sql for future persistence. Include a +docker-compose.yml with PostgreSQL 16 and RabbitMQ 3.13 for +future event streaming. + +Testing: node:test runner. Cover risk scoring, status transitions, +full ingest flow with incident opening, error envelope contracts, +duplicate/out-of-order resilience, and a frontend smoke test. +Target: minimum 9 tests, all passing. + +Constraints: +- MUST: tenant isolation on every path +- MUST: idempotent ingest (no state mutation on duplicates) +- MUST: immutable audit trail +- MUST NOT: use Express, Fastify, or any HTTP framework +- MUST NOT: use jsdom or external test runners +- SHOULD: keep ingest p95 < 500ms +- SHOULD: keep architecture ready for RabbitMQ/PostgreSQL adapters +``` + +## What the framework should produce + +- **Size:** Full (new project, major architecture) +- **Domain profile:** New — no existing profile matches this backend-only Node.js stack. The LLM creates one from `_template.md` +- **Artifacts:** Intent, Design, Verification Log (with all gates), new domain profile + +## What to look for when reviewing the output + +1. **Domain profile creation** — The LLM should create a new profile (e.g., `backend-node-ts-event-driven.md`) with pitfalls specific to this stack: in-memory map tenant isolation, node:test quirks, ESM/CJS boundaries, idempotency edge cases. + +2. **Risk engine design** — The Design document should show scoring rules, threshold logic, and status state machine before any code is written. + +3. **Adversary Questions** — Even without an existing profile, the LLM should generate adversary questions in the Design: "What happens if two telemetry events arrive with the same timestamp?", "What happens if a tenant ID is missing from a request?", etc. + +4. **Gate failures becoming pitfalls** — If any gate fails (it likely will on first pass), watch for the LLM adding the root cause to the new domain profile. This is the flywheel starting. + +5. **Visual execution** — The prompt asks for a bold departure from typical dashboards. The Design should document the visual direction as an architectural decision with rationale. + +## Original project reference + +The original project (built with an earlier framework version) used: +- 438-line ControlTowerService with in-memory tenant-scoped maps +- 68-line risk engine with threshold-based scoring +- 206-line HTTP server with structured error handling +- 364-line Lit component with 12-column grid layout +- 9 tests covering unit, integration, contract, resilience, and smoke +- Corporate blue/teal palette (professional but conventional) +- All 4 verification gates passing + +The prompt above preserves all functional requirements while pushing for a more distinctive visual identity and letting the current framework version guide the process. diff --git a/examples/mcp-task-widget/PROMPT.md b/examples/mcp-task-widget/PROMPT.md new file mode 100644 index 0000000..1b013f5 --- /dev/null +++ b/examples/mcp-task-widget/PROMPT.md @@ -0,0 +1,42 @@ +# Example: MCP Task Widget + +## Context + +This example was generated by GPT 5.4 (ChatGPT) using Agent Kit. The goal was to build an MCP App for ChatGPT — a task management widget embedded in the conversation canvas. + +The domain profile `apps-sdk-mcp-lit-vite.md` already existed from a previous project. GPT 5.4 loaded it, followed the full process (Intent → Design → Gated Build → Self-Review), discovered a new pitfall during implementation (CSS asset reference), and updated the domain profile — demonstrating the learning cycle in action. + +## The prompt + +``` +Read AGENTS.md. Build an MCP App for ChatGPT that turns this existing +Lit todo application into an embedded widget with MCP tools for task +management. Use the components and patterns from this repo as the UI +base: https://github.com/oscarmarina/lit-signals-material — it uses +Lit 3 + signals + Material Web. The MCP server should expose tools for +CRUD operations on tasks, and the widget should communicate through the +MCP Apps bridge. Stack: MCP Apps, Lit, Vite, TypeScript. +``` + +## What Agent Kit determined + +- **Size:** Full (new project) +- **Domain profile:** `apps-sdk-mcp-lit-vite` (keyword score 6, unique match) +- **Artifacts produced:** Intent, Design, Verification Log + +## What happened during the build + +1. Gate 0 failed — package version was wrong (`@modelcontextprotocol/ext-apps@1.27.1` doesn't exist). GPT root-caused it (stale terminal cache), fixed the version, and re-ran. +2. All subsequent gates passed (Gate 1 through Gate 4). +3. During self-review, GPT discovered that the resource HTML referenced `assets/index.css` but Vite might not emit a CSS file when all styles live in Lit's `static styles`. It added **Pitfall 11** to the domain profile. +4. Final result: a working MCP App with 5 tests (3 server, 2 widget in Playwright), clean build from scratch passing. + +## Generated artifacts + +The files in this directory are the actual artifacts GPT 5.4 produced: + +- [`mcp-task-widget-intent.md`](mcp-task-widget-intent.md) — What and why +- [`mcp-task-widget-design.md`](mcp-task-widget-design.md) — How (architecture, decisions, risks, pitfalls applied) +- [`mcp-task-widget-verification.md`](mcp-task-widget-verification.md) — Proof (real gate output, failure history, domain profile updates) + +These show the complete output of the framework for a Full-sized project. No code is included — the artifacts are the interesting part. diff --git a/examples/mcp-task-widget/mcp-task-widget-design.md b/examples/mcp-task-widget/mcp-task-widget-design.md new file mode 100644 index 0000000..a2ac94f --- /dev/null +++ b/examples/mcp-task-widget/mcp-task-widget-design.md @@ -0,0 +1,124 @@ +# Design: MCP Task Widget + +**Intent:** docs/mcp-task-widget-intent.md +**Domain Profile:** agent-kit/domains/mcp-apps-lit-vite.md +**Date:** 2026-03-09 + +## Domain Profile Selection Rationale + +| Candidate Profile | Keyword Score | Excluded? | Reason | +|-------------------|---------------|-----------|--------| +| `mcp-apps-lit-vite` | 6 | No | Matches `mcp apps`, `lit`, `vite`, `widget`, `iframe`, and the ChatGPT embedded UI requirement | + +**Selected Profile:** `mcp-apps-lit-vite` +**Selection Basis:** Unique highest score `>= 2` + +## Architecture + +### Stack + +| Technology | Version | Verified Via | Purpose | +|-----------|---------|-------------|---------| +| `lit` | `3.3.2` | `npm view lit version` | Widget component model | +| `vite` | `7.3.1` | `npm view vite version --json` | Widget bundling | +| `typescript` | `5.9.3` | `npm view typescript version` | Shared typing and compilation | +| `@modelcontextprotocol/sdk` | `1.27.1` | `npm view @modelcontextprotocol/sdk version --registry=https://registry.npmjs.org` | MCP server transports and core server APIs | +| `@modelcontextprotocol/ext-apps` | `1.2.0` | `npm view @modelcontextprotocol/ext-apps version --registry=https://registry.npmjs.org` and the published package source | Widget bridge and app resource registration | +| `@material/web` | `2.4.1` | Referenced UI base package manifest | Material Web UI components | +| `@lit-labs/signals` | `0.2.0` | `npm view @lit-labs/signals version` | Signals-based widget state | +| `zod` | `4.x` | Package install lock verification during Gate 0 | Tool and persistence validation | + +### Structure + +- `src/server/` contains the MCP server entrypoint, HTTP bootstrap, app resource registration, task repository, and Zod schemas. +- `src/widget/` contains the Lit widget entrypoint, signal-based controller, Material Web custom elements, and CSS. +- `src/shared/` contains task schemas and types shared across server and widget boundaries. +- `docs/` contains intent, design, and verification artifacts. +- `data/tasks.json` stores persisted task state for local development and runtime. + +### Data Flow + +1. The HTTP server receives a request on `/mcp` and creates a stateless `StreamableHTTPServerTransport` plus a fresh `McpServer` instance. +2. MCP tools validate input with Zod, mutate the file-backed task repository, and return an authoritative `{ tasks, summary }` snapshot in `structuredContent`. +3. The widget initializes `App` from `@modelcontextprotocol/ext-apps`, receives tool results, and calls additional server tools with `app.callServerTool(...)`. +4. A widget controller stores authoritative task data in signals and derives counts, progress, and filtered lists via computed signals. +5. UI-only state such as draft text, edit mode, and active filter remains in widget-local signals and never crosses into server payloads. + +### Initialization Chain + +1. `src/server/main.ts` starts either stdio or the HTTP server. +2. The HTTP server mounts `/mcp` and `/widget/*`, enabling CORS on both routes. +3. The MCP tool registration links task tools to a `ui://tasks/task-board.html` resource. +4. ChatGPT fetches the resource HTML, which references externally served widget assets under `/widget/assets/*`. +5. `src/widget/main.ts` creates an `App`, registers lifecycle callbacks, applies host theme variables, connects to the host, and triggers the initial `list_tasks` call. +6. The Lit root component renders from signals and dispatches CRUD actions back through the bridge client. + +### Dependencies + +**Production:** +- `@lit-labs/signals` — signal and computed state for the widget controller +- `@material/web` — Material Web inputs, buttons, list, checkbox, divider, and progress components +- `@modelcontextprotocol/ext-apps` — widget `App`, host theming helpers, and server-side app registration helpers +- `@modelcontextprotocol/sdk` — MCP server implementation and transports +- `node:http` — HTTP server for `/mcp` and `/widget/*` +- `zod` — schema validation at MCP and persistence boundaries + +**Development:** +- `@types/node` — TypeScript types for runtime APIs +- `@vitest/browser-playwright` — browser-mode widget tests +- `playwright` — real browser runtime for widget verification +- `vitest` — server and widget tests +- `typescript` — compilation +- `vite` — widget build + +## Decisions + +| # | Decision | Choice | Alternatives Considered | Rationale | +|---|----------|--------|------------------------|-----------| +| 1 | Server framework | `node:http` with manual route handling | Express, Fastify | Matches the project decision from the previous session and keeps the HTTP surface minimal for this embedded MCP server | +| 2 | Tool surface | `list_tasks`, `create_task`, `update_task`, `delete_task` | Single monolithic task mutation tool | Explicit CRUD tools are easier for both models and widget code to reason about | +| 3 | Persistence | JSON file repository | In-memory only, SQLite | JSON is sufficient for a local MCP sample and avoids infrastructure overhead | +| 4 | Widget composition | Lit custom element plus signals controller | Monolithic imperative DOM app | Keeps the architecture close to the referenced `lit-signals-material` project | +| 5 | Styling | Material Web components plus custom CSS variables and gradient surfaces | Plain HTML controls | Preserves the requested UI base and provides a stronger embedded-widget presentation | +| 6 | Asset resolution | Stable Vite output names and server-generated resource HTML | Parsing Vite manifest at runtime | Stable names reduce moving parts and align with the selected domain profile | + +## Risks (Adversary Lens) + +| Risk | Impact | Likelihood | Mitigation | +|------|--------|------------|------------| +| Public origin is wrong behind a tunnel or proxy | Widget assets fail to load in ChatGPT | Medium | Resolve origin from env first, then forwarded headers, and include it in CSP and asset URLs | +| Persisted JSON is malformed or manually edited | Tool calls can crash or return invalid data | Medium | Parse storage through Zod and fall back to an empty list on invalid content | +| Widget attempts calls before bridge connection finishes | Initial render stalls or errors | Medium | Controller tracks connection state and defers actions until `app.connect()` resolves | +| Cross-origin module asset requests are blocked | Widget stays blank | Medium | Add `Access-Control-Allow-Origin: *` to `/widget/*` responses | +| Future changes leak UI-only fields into tool payloads | Model-visible data becomes polluted | Low | Centralize snapshot construction on the server and keep UI state in the controller only | + +## Domain Pitfalls Applied + +| Pitfall | Applies? | How Addressed | +|---------|----------|---------------| +| Official SDK bridge only | Yes | Widget uses `App` from `@modelcontextprotocol/ext-apps`; no custom bridge code is written | +| UI state must not enter `structuredContent` | Yes | Server snapshots contain only task entities and summary counts | +| Relative/portable widget assets | Yes | Vite uses `base: './'` and stable output names | +| No inline bundled scripts | Yes | Resource HTML points to `/widget/assets/index.js` and `/widget/assets/index.css` | +| CSP must include server domain | Yes | Resource metadata includes the resolved public origin in `connectDomains` and `resourceDomains` | +| Widget fallback background | Yes | Resource HTML includes an explicit body background style | +| Stateless transport | Yes | `/mcp` creates `StreamableHTTPServerTransport({ sessionIdGenerator: undefined })` | +| Widget tests must use a real browser | Yes | Vitest uses a browser project backed by Playwright | +| Widget assets need CORS headers | Yes | `/widget/*` responses include `Access-Control-Allow-Origin: *` | + +## Verification + +| Gate | Command | Pass Criteria | +|------|---------|---------------| +| 0 | `npm install` | Exit 0 and no unresolved dependency errors | +| 1 | `npm run build` | Server build succeeds and widget assets are emitted under `dist/widget/` | +| 2 | `npm run build && npm test` | Build succeeds and all tests pass | +| 3 | `npm test` | All server and widget tests pass | +| 4 | `rm -rf dist node_modules && npm install && npm run build && npm test` | Clean build and tests pass from scratch | + +## Test Strategy + +- **What to test:** task repository behavior, CRUD tool behavior, resource HTML generation, widget initial load, and widget CRUD interactions against a mocked bridge client +- **How:** Vitest workspace with a Node project for server tests and a Playwright browser project for widget tests +- **Coverage target:** At least core CRUD paths and the widget’s create/update/delete flow +- **What NOT to test:** Material Web internals, MCP SDK internals, or browser rendering details already owned by dependencies \ No newline at end of file diff --git a/examples/mcp-task-widget/mcp-task-widget-intent.md b/examples/mcp-task-widget/mcp-task-widget-intent.md new file mode 100644 index 0000000..1bfa863 --- /dev/null +++ b/examples/mcp-task-widget/mcp-task-widget-intent.md @@ -0,0 +1,70 @@ +# Intent: MCP Task Widget + +**Date:** 2026-03-09 +**Size:** Full +**Domain Profile:** agent-kit/domains/mcp-apps-lit-vite.md +**Supersedes:** — + +## Goal + +Build a ChatGPT-compatible MCP App that turns the Lit todo UI into an embedded widget backed by MCP tools for task CRUD operations. The widget should use Lit 3, signals, Material Web, and the MCP Apps bridge so task management works directly inside the conversation UI. + +## Behavior + +- **Given** ChatGPT renders the widget resource for the task tool, **when** the widget loads, **then** it connects through the official MCP Apps bridge and renders the authoritative task list returned by the server. +- **Given** a user enters a new task in the widget, **when** they submit it, **then** the widget calls an MCP server tool and re-renders from the returned task snapshot. +- **Given** a rendered task, **when** the user edits its title or completion state, **then** the widget updates the task through an MCP tool and reflects the updated server state. +- **Given** a rendered task, **when** the user deletes it, **then** the widget calls the delete MCP tool and removes the task from the rendered list using the returned snapshot. +- **Given** the MCP server is exposed over HTTP, **when** ChatGPT fetches the widget resource and assets, **then** the resource HTML and asset routes load correctly under the sandbox CSP and cross-origin rules. + +## Decisions + +| Decision | Choice | Rejected | Why | +|----------|--------|----------|-----| +| MCP UI bridge | `App` from `@modelcontextprotocol/ext-apps` | Custom `postMessage` bridge | The domain profile explicitly requires the official SDK and it handles host lifecycle safely | +| Server transport | Streamable HTTP plus stdio entrypoint | HTTP-only or stdio-only | HTTP is needed for ChatGPT embedding and stdio remains useful for local MCP testing | +| HTTP runtime | `node:http` | Express | This project already chose the native HTTP server and the reference implementation style fits the required routes | +| Task storage | File-backed JSON repository with Zod validation | Pure in-memory state | File-backed storage preserves tasks across server restarts without adding external infrastructure | +| Widget state model | Signals-based controller feeding Lit components | Ad hoc element-local state only | Matches the referenced Lit repo patterns and keeps server data authoritative | +| Widget asset delivery | External Vite-built assets served from `/widget/*` | Inline bundled HTML/JS | ChatGPT iframe CSP blocks inline bundled scripts | + +## Constraints + +**MUST:** +- Use MCP Apps, Lit, Vite, and TypeScript. +- Use components and patterns from the `lit-signals-material` repo as the UI base. +- Expose MCP tools for task CRUD operations. +- Have the widget communicate through the MCP Apps bridge. +- Produce project artifacts in `docs/`. + +**MUST NOT:** +- Implement a custom iframe bridge instead of the official MCP Apps SDK. +- Inline bundled widget JavaScript into the MCP resource HTML. +- Leak UI-only transient state into tool `structuredContent`. + +**SHOULD:** +- Keep the widget visually close to the referenced Lit + signals + Material style. +- Preserve deterministic build output for widget asset URLs. +- Include automated tests for both server and widget flows. + +## Scope + +**IN:** +- MCP server with task list, create, update, and delete tools +- Embedded Lit widget for creating, editing, completing, filtering, and deleting tasks +- File-backed task persistence and schema validation +- Vite widget build, TypeScript server build, and browser-based widget tests +- Verification log entries for all required gates + +**OUT:** +- Multi-user synchronization +- Authentication and per-user task separation +- Rich due dates, labels, reminders, or notifications +- Remote database integration + +## Acceptance + +- `npm install`, `npm run build`, and `npm test` pass. +- The MCP server serves `/mcp` and widget assets with a resource HTML document that ChatGPT can load. +- The widget uses MCP tool calls to create, read, update, and delete tasks. +- The widget styling and component approach reflect Lit 3 + signals + Material Web patterns from the referenced repo. \ No newline at end of file diff --git a/examples/mcp-task-widget/mcp-task-widget-verification.md b/examples/mcp-task-widget/mcp-task-widget-verification.md new file mode 100644 index 0000000..1e3b0c1 --- /dev/null +++ b/examples/mcp-task-widget/mcp-task-widget-verification.md @@ -0,0 +1,275 @@ +# Verification Log: MCP Task Widget + +This log captures the actual output of every verification gate. It is the source of truth for project completion status. + +**Rule:** No entry may be written without executing the command and pasting real output. "Assumed to pass" is not an entry. + +--- + +## Progress + +**Current phase:** Complete +**Last updated:** 2026-03-09 16:16 local + +| Step | Status | +|------|--------| +| Intent | PASS | +| Design | PASS | +| Gate 0: Dependencies | PASS | +| Gate 1: Scaffold | PASS | +| Gate 2: Feature | PASS | +| Gate 3: Tests | PASS | +| Gate 4: Clean build | PASS | +| Self-Review | PASS | +| Domain update | PASS | + +**Update this section after every gate or phase transition. When resuming interrupted work, read this section first.** + +--- + +## Gate 0: Dependencies +**Executed:** 2026-03-09 16:15 local +**Command:** `npm install` +**Exit code:** 0 +**Status:** PASS + +
+Output + +```text +removed 11 packages, and audited 159 packages in 632ms + +44 packages are looking for funding + run `npm fund` for details + +found 0 vulnerabilities +``` + +
+ +**Notes:** This gate was re-run after migrating the HTTP runtime from Express to `node:http`; the install removed direct Express packages from the root manifest while keeping the rest of the dependency graph healthy. + +--- + +## Gate 1: Scaffold Verification +**Executed:** 2026-03-09 16:15 local +**Command:** `npm run build` +**Exit code:** 0 +**Status:** PASS + +
+Output + +```text +> mcp-task-widget@1.0.0 build +> tsc --noEmit && vite build && tsc -p tsconfig.server.json + +vite v7.3.1 building client environment for production... +✓ 243 modules transformed. +dist/widget/index.html 0.45 kB │ gzip: 0.27 kB +dist/widget/assets/index.css 0.21 kB │ gzip: 0.16 kB +dist/widget/assets/index.js 546.29 kB │ gzip: 131.15 kB + +(!) Some chunks are larger than 500 kB after minification. Consider: +- Using dynamic import() to code-split the application +- Use build.rollupOptions.output.manualChunks to improve chunking: https://rollupjs.org/configuration-options/#output-manualchunks +- Adjust chunk size limit for this warning via build.chunkSizeWarningLimit. +✓ built in 555ms +``` + +
+ +**Notes:** Revalidated after replacing Express with `node:http`. Build output remains deterministic and still emits both `assets/index.js` and `assets/index.css` for the MCP resource HTML. + +--- + +## Gate 2: Feature Verification +**Executed:** 2026-03-09 16:16 local +**Command:** `npm run build && npm test` +**Exit code:** 0 +**Status:** PASS + +
+Output + +```text +> mcp-task-widget@1.0.0 build +> tsc --noEmit && vite build && tsc -p tsconfig.server.json + +vite v7.3.1 building client environment for production... +✓ 243 modules transformed. +dist/widget/index.html 0.45 kB │ gzip: 0.27 kB +dist/widget/assets/index.css 0.21 kB │ gzip: 0.16 kB +dist/widget/assets/index.js 546.29 kB │ gzip: 131.15 kB + +(!) Some chunks are larger than 500 kB after minification. Consider: +- Using dynamic import() to code-split the application +- Use build.rollupOptions.output.manualChunks to improve chunking: https://rollupjs.org/configuration-options/#output-manualchunks +- Adjust chunk size limit for this warning via build.chunkSizeWarningLimit. +✓ built in 552ms + +> mcp-task-widget@1.0.0 test +> vitest run + + + RUN v4.0.18 /Users/oscarmarina/Projects/AGENTS/openai-apps-sdk + + ✓ server src/server/task-service.test.ts (1 test) 7ms + ✓ server src/server/widget-resource.test.ts (2 tests) 1ms +stderr | unknown test +Lit is in dev mode. Not recommended for production! See https://lit.dev/msg/dev-mode for more information. + ✓ widget (chromium) src/widget/task-widget-app.test.ts (2 tests) 133ms + + Test Files 3 passed (3) + Tests 5 passed (5) + Start at 16:16:23 + Duration 1.39s (transform 47ms, setup 0ms, import 833ms, tests 141ms, environment 0ms) +``` + +
+ +**Notes:** Revalidated on the `node:http` implementation. The Lit dev-mode stderr is expected in test runs and did not affect correctness. + +--- + +## Gate 3: Test Verification +**Executed:** 2026-03-09 16:15 local +**Command:** `npm test` +**Exit code:** 0 +**Status:** PASS +**Tests passed:** 5/5 +**Coverage:** not measured + +
+Output + +```text +> mcp-task-widget@1.0.0 test +> vitest run + + + RUN v4.0.18 /Users/oscarmarina/Projects/AGENTS/openai-apps-sdk + +4:15:35 PM [vite] (client) Re-optimizing dependencies because lockfile has changed + ✓ server src/server/task-service.test.ts (1 test) 16ms + ✓ server src/server/widget-resource.test.ts (2 tests) 1ms +stderr | unknown test +Lit is in dev mode. Not recommended for production! See https://lit.dev/msg/dev-mode for more information. + ✓ widget (chromium) src/widget/task-widget-app.test.ts (2 tests) 149ms + + Test Files 3 passed (3) + Tests 5 passed (5) + Start at 16:15:35 + Duration 1.56s (transform 84ms, setup 0ms, import 1.11s, tests 166ms, environment 0ms) +``` + +
+ +**Notes:** Browser-mode widget tests executed in Chromium through `@vitest/browser-playwright` after the lockfile and runtime migration. + +--- + +## Gate 4: Final Verification (Clean Build) +**Executed:** 2026-03-09 16:16 local +**Clean command:** `rm -rf dist node_modules && npm install && npm run build && npm test` +**Exit code:** 0 +**Status:** PASS +**Tests passed:** 5/5 + +
+Output + +```text +added 158 packages, and audited 159 packages in 1s + +44 packages are looking for funding + run `npm fund` for details + +found 0 vulnerabilities + +> mcp-task-widget@1.0.0 build +> tsc --noEmit && vite build && tsc -p tsconfig.server.json + +vite v7.3.1 building client environment for production... +✓ 243 modules transformed. +dist/widget/index.html 0.45 kB │ gzip: 0.27 kB +dist/widget/assets/index.css 0.21 kB │ gzip: 0.16 kB +dist/widget/assets/index.js 546.29 kB │ gzip: 131.15 kB + +(!) Some chunks are larger than 500 kB after minification. Consider: +- Using dynamic import() to code-split the application +- Use build.rollupOptions.output.manualChunks to improve chunking: https://rollupjs.org/configuration-options/#output-manualchunks +- Adjust chunk size limit for this warning via build.chunkSizeWarningLimit. +✓ built in 602ms + +> mcp-task-widget@1.0.0 test +> vitest run + + + RUN v4.0.18 /Users/oscarmarina/Projects/AGENTS/openai-apps-sdk + + ✓ server src/server/task-service.test.ts (1 test) 25ms + ✓ server src/server/widget-resource.test.ts (2 tests) 5ms +stderr | unknown test +Lit is in dev mode. Not recommended for production! See https://lit.dev/msg/dev-mode for more information. + ✓ widget (chromium) src/widget/task-widget-app.test.ts (2 tests) 118ms + + Test Files 3 passed (3) + Tests 5 passed (5) + Start at 16:16:31 + Duration 1.58s (transform 81ms, setup 0ms, import 1.05s, tests 148ms, environment 0ms) +``` + +
+ +**Notes:** Clean verification confirms the `node:http` server bootstrap, start script target, and full reinstall path all work from scratch. + +--- + +## Self-Review (Full projects only) + +### Domain Checklist Results + +| Check | Command | Result | Pass? | +|-------|---------|--------|-------| +| Uses official SDK bridge | `grep -R "@modelcontextprotocol/ext-apps" -n src/widget ; grep -R "document\.referrer" -n src/widget ; true` | `src/widget/mcp-task-bridge.ts:7` imports the SDK; `document.referrer` had no matches | YES | +| No UI-only fields in structured content | `grep -R "selected\|expanded\|draft" -n src/server ; true` | No matches in `src/server` | YES | +| Relative Vite assets configured | `grep -n "base:\|entryFileNames" vite.config.ts` | Found `base: './'` and `entryFileNames: 'assets/index.js'` | YES | +| CSP includes server domain | `grep -R "connectDomains\|resourceDomains\|sessionIdGenerator\|access-control-allow-origin" -n src/server` | Found CSP domains in `src/server/widget-resource.ts`, CORS headers in `src/server/main.ts`, and stateless transport config in `src/server/main.ts` | YES | +| Widget tests use browser env | `grep -n "browser:" vitest.config.ts && grep -n "jsdom" vitest.config.ts` | Browser config present at `vitest.config.ts`; `jsdom` had no matches | YES | +| External script in resource HTML | `grep -R "widget/assets/index.js\|widget/assets/index.css" -n src/server` | Absolute asset references present in `src/server/widget-resource.ts` | YES | +| CSS asset exists when linked | `test -f dist/widget/assets/index.css && test -f dist/widget/assets/index.js` | Both files exist after build | YES | +| Startup path matches build output | `test -f dist/server/server/main.js && node -e "console.log(require('./package.json').scripts.start)"` | Built entry exists and script is `node dist/server/server/main.js` | YES | + +### Devil's Advocate +1. **What happens when:** `PUBLIC_ORIGIN` is not set in a remote deployment, ChatGPT loads the widget over a slower network with the current 546 kB JS bundle, or multiple users share the same server-side JSON task file. +2. **The weakest link is:** runtime deployment configuration. The local fallback origin is correct for development, but remote ChatGPT usage still depends on the operator setting a public HTTPS origin correctly. +3. **If I had to break this, I would:** deploy the server behind a tunnel without setting `PUBLIC_ORIGIN`, then let the resource HTML point back to `http://localhost:3001`, which would make widget asset loading fail in the host iframe. + +### Findings + +| # | Severity | Finding | Impact | +|---|----------|---------|--------| +| 1 | P2 | The built widget bundle is `546.29 kB` minified, which triggers Vite's chunk-size warning. | Embedded widget startup may be slower than necessary in ChatGPT and other MCP hosts. | +| 2 | P2 | Remote deployments still require `PUBLIC_ORIGIN` to be set correctly; the default fallback is development-only. | A misconfigured deployment can return valid MCP responses while the iframe fails to load widget assets. | +| 3 | P3 | Task persistence is global and file-backed rather than user-scoped. | A multi-user deployment would share one task list across all users unless a per-user storage layer is added. | + +--- + +## Failure History + +### 2026-03-09 Gate 0 FAILED +**Error:** `npm install` failed with `ETARGET` for `@modelcontextprotocol/ext-apps@1.27.1`. +**Root Cause:** Initial version verification was contaminated by a shared terminal that had drifted into a cloned `/tmp` repo, so the package versions recorded in `package.json` were not the actual npm registry versions. +**Fix:** Re-verified package versions in fresh shells against `https://registry.npmjs.org`, updated the manifest to `@modelcontextprotocol/ext-apps@1.2.0` and `@modelcontextprotocol/sdk@1.27.1`, and re-ran `npm install` successfully. +**Re-run result:** PASS — see Gate 0 above for passing output. + +--- + +## Domain Profile Updates + +| What Changed | Section Updated | Trigger | +|---|---|---| +| Added Pitfall 11 for fixed stylesheet links without an emitted CSS asset | Common Pitfalls | Resource HTML referenced `assets/index.css` before the build emitted that file | +| Added a CSS asset existence automated check | Automated Checks | Needed a repeatable way to verify that a fixed stylesheet link is valid after build | +| Added a decision to import a real global stylesheet when linking `/widget/assets/index.css` | Decision History / Review Checklist | Final widget resource design now depends on a stable emitted CSS asset | \ No newline at end of file diff --git a/examples/test-habit-tracker/PROMPT.md b/examples/test-habit-tracker/PROMPT.md new file mode 100644 index 0000000..5406af1 --- /dev/null +++ b/examples/test-habit-tracker/PROMPT.md @@ -0,0 +1,112 @@ +# Prompt para el Orchestrator: Habit Tracker (Angular + Lit) + +> **Propósito:** Este prompt está diseñado para probar el framework de desarrollo en `.github/`. Antes de empezar, lee el archivo `.github/agents/orchestrator.agent.md` y sigue su Execution Flow completo. Usa todos los templates de `.github/templates/`, carga el domain profile de `.github/domains/` si existe uno que coincida con el stack, y genera todos los artefactos que el framework requiere (PRD, Tech Spec, Implementation Plan, VERIFICATION_LOG.md, PROJECT_STATUS.md, Review). + +--- + +## El proyecto + +Quiero construir **Habit Tracker** — una aplicación web SPA para hacer seguimiento de hábitos diarios con rachas (streaks) y progreso visual. + +### Stack tecnológico + +- **Angular 19+** (standalone components, sin NgModules) para la estructura de la app, routing, formularios y estado +- **Lit 3+** (Web Components) para los componentes visuales reutilizables (tarjetas, indicadores de progreso, badges) +- **Vite** como bundler (con el plugin de Angular correspondiente) +- **TypeScript** en modo strict +- **Vitest** para tests + +Intenté usar NgModules pero añaden demasiado boilerplate para una app de este tamaño — quiero solo standalone components. + +### Funcionalidades + +#### 1. Lista de hábitos +- Mostrar todos los hábitos del usuario en una cuadrícula responsive +- Cada hábito se muestra en una **tarjeta visual** (componente Lit ``) que incluye: + - Nombre del hábito + - Icono (emoji seleccionable) + - Anillo de progreso circular mostrando el porcentaje de completitud semanal + - Badge de racha (streak) si lleva 3+ días consecutivos + - Botón de check para marcar el hábito como completado hoy +- Cuando un hábito está marcado como completado hoy, la tarjeta debe **deshabilitar sus controles** y mostrar un estado visual de "done" + +#### 2. Formulario de creación/edición +- Modal Angular para crear o editar un hábito +- Campos: nombre (requerido, max 30 chars), icono (emoji picker simplificado con 10 opciones predefinidas), frecuencia objetivo (diario, 5/semana, 3/semana) +- Validación reactiva con Angular Reactive Forms + +#### 3. Vista de progreso +- Componente Lit `` que muestra un anillo SVG con porcentaje animado +- Componente Lit `` que muestra la racha actual con efectos visuales (fuego 🔥 para 7+, estrella ⭐ para 30+) +- Componente Angular para la página de estadísticas: mejor racha, tasa de completitud del mes, hábito más consistente + +#### 4. Persistencia +- LocalStorage para almacenar hábitos y registros de completitud +- Servicio Angular con signals para gestionar el estado reactivo +- Al cargar la app, el servicio lee de localStorage y popula los signals +- Cada cambio de estado se persiste automáticamente + +#### 5. Temas +- Soporte para tema claro y oscuro +- Toggle en el header (componente Lit ``) +- El tema se persiste en localStorage +- CSS custom properties para los colores del tema + +### Requisitos no funcionales + +- La app debe funcionar offline (no hay backend, todo es local) +- El anillo de progreso debe animar suavemente la transición de porcentaje (CSS transitions) +- La cuadrícula de hábitos debe ser responsive: 3 columnas en desktop, 2 en tablet, 1 en mobile +- Accesible: los botones deben tener aria-labels, los colores deben tener contraste suficiente + +### Restricciones + +- **No usar NgModules** — solo standalone components con `imports` directos +- **No usar librerías de componentes externas** (Material, PrimeNG, etc.) — los componentes Lit son el sistema de diseño +- **No usar backend ni Firebase** — solo localStorage +- El proyecto debe poder hacerse `npm run build` y `npm run dev` sin errores desde cero +- **No usar Tailwind CSS** + +### Tests esperados + +- Test unitario del servicio de hábitos (crear, completar, calcular racha) +- Test unitario de al menos un componente Lit (habit-card: renderizado, eventos) +- Test unitario de al menos un componente Angular (formulario: validación) +- Mínimo 70% de cobertura en el servicio de hábitos + +### Estructura sugerida (no obligatoria) + +``` +habit-tracker/ +├── src/ +│ ├── app/ # Angular +│ │ ├── core/services/ # HabitService, StorageService +│ │ ├── features/ +│ │ │ ├── habits/ # Lista, formulario +│ │ │ └── stats/ # Página de estadísticas +│ │ └── shared/ # Tipos, utilidades +│ ├── components/ # Lit Web Components +│ │ ├── habit-card.ts +│ │ ├── progress-ring.ts +│ │ ├── streak-badge.ts +│ │ └── theme-toggle.ts +│ ├── styles/ # CSS global, theme variables +│ ├── main.ts # Entry point +│ └── index.html +├── package.json +├── tsconfig.json +├── vite.config.ts +└── vitest.config.ts +``` + +### Lo que espero al final + +1. Que pueda hacer `npm install && npm run build && npm run dev` y ver la app funcionando +2. Que los tests pasen con `npm test` +3. Que los componentes Lit funcionen correctamente dentro de las páginas Angular +4. Que el estado sea reactivo — marcar un hábito como completado actualice inmediatamente la UI +5. Todos los artefactos del framework: PRD, Tech Spec, Implementation Plan, VERIFICATION_LOG.md con evidencia real, PROJECT_STATUS.md y Review + +--- + +**Nota para el LLM:** Este proyecto usa Angular + Lit. Comprueba si existe un domain profile en `.github/domains/` para este stack antes de empezar. Si existe, cárgalo y aplica sus reglas de integración, pitfalls y verification commands. Si no existe, créalo como parte de la fase de Tech Spec. diff --git a/agent-kit/BUILDER.md b/framework/BUILDER.md similarity index 65% rename from agent-kit/BUILDER.md rename to framework/BUILDER.md index b9477e8..a366f6a 100644 --- a/agent-kit/BUILDER.md +++ b/framework/BUILDER.md @@ -46,30 +46,33 @@ Apply the domain profile's accumulated knowledge: - Check every Common Pitfall against the current design - Answer every Adversary Question from the profile against the current plan - Follow Integration Rules exactly -- Use domain-specific terminology and verification commands -- Run Automated Checks from the domain profile mechanically -- If no domain profile exists, create one using `agent-kit/domains/_template.md` +- Use domain-specific terminology +- Prepare the codebase to pass the Automated Checks when the GateKeeper runs them +- If no domain profile exists, create one (see the Creating section under Domain Profiles below) ## Process (Scaled by Size) ### Quick (< 3 files, clear intent) -1. Understand intent → Code → Verify (Gate 2 minimum) → Done +1. Understand intent → Code → Verify (Gate 2 minimum) → If fix reveals a missing pitfall or wrong assumption, update domain profile → Done +2. If skills exist in `.github/skills/` or `.agents/skills/` and one clearly matches the task, load it before coding. Do not search if the task is trivial. **Escalation rule:** If a Quick task touches more than 3 files or uncovers bugs beyond the original scope, stop and escalate to Standard. Capture a retroactive INTENT before continuing. The cost of pausing is low; the cost of an unscoped debugging spiral is high. ### Standard (feature-sized) 1. Capture Intent (extract or create CIR from prompt → `docs/[project]-intent.md`) -2. Load domain profile from `agent-kit/domains/` — then **read every Common Pitfall and Adversary Question** before proceeding. These inform the design; loading without reading is useless. -3. Design — all lenses in one pass → `docs/[project]-design.md`. The design MUST include both "Adversary Questions Applied" (answers to each profile question against this specific design) and "Domain Pitfalls Applied" (how each pitfall is addressed). These are separate sections — checking pitfalls does not replace answering adversary questions. -4. Gated build (Gate 0 → Gate 1 → Gate 2 per feature phase) -5. Tests + verification (Gate 3 → Gate 4) -6. Self-review (Adversary Lens on finished code + domain checklist) -7. Domain learning (verify domain profile was updated during implementation — if any fix or discovery was missed, update now) +2. Load domain profile from `framework/domains/` — then **read every Common Pitfall and Adversary Question** before proceeding. These inform the design; loading without reading is useless. +3. Load relevant skills — scan `.github/skills/` and `.agents/skills/` for `SKILL.md` files. Read each `description` field. If a skill matches the task domain, load it as design guidance. If no skills exist or none match, skip this step. Record loaded skills in the Design document. +4. Design — all lenses in one pass → `docs/[project]-design.md`. The design MUST include both "Adversary Questions Applied" (answers to each profile question against this specific design) and "Domain Pitfalls Applied" (how each pitfall is addressed). These are separate sections — checking pitfalls does not replace answering adversary questions. +5. Gated build (Gate 0 → Gate 1 → Gate 2 per feature phase) +6. Tests + verification (Gate 3 → Gate 4) +7. Self-review (Adversary Lens on finished code + domain checklist) +8. Domain learning (verify domain profile was updated during implementation — if any fix or discovery was missed, update now) ### Full (new project or major rearchitecture) Same as Standard, but: - Design doc is more detailed (ADRs for each major decision) - Build is phased with Gate 2 after each phase +- Skill loading (step 3) is mandatory — even if no skills match, document that none were found - Self-review follows the full adversarial protocol: - Devil's Advocate section (3 uncovered scenarios, weakest link, attack vector) - Minimum 3 genuine findings (don't invent — if fewer, document what you checked) @@ -87,32 +90,18 @@ When in doubt, go one size up. The cost of slight over-documentation is lower th For Standard and Full tasks, produce the Intent document **before** continuing to investigate. Research enough to understand the prompt, then commit decisions to `docs/[project]-intent.md`. If a decision is unclear, write it as an open question in the Intent and ASK the human — don't keep researching hoping the answer will appear. The Intent is where decisions are recorded; internal deliberation without an artifact is wasted work that disappears on context loss. -## Verification Protocol +## Verification Protocol (Adversarial Handoff) -Gates are MANDATORY for Standard and Full. For Quick, Gate 2 minimum. +**You NEVER verify your own work.** You are strictly the Implementer. +Once you have prepared the code for a specific Gate phase, your role pauses. You must signal the completion of the phase and hand control over to the **GateKeeper**. -### Gate Definitions +1. Finish writing the code/scaffolding. +2. Formally declare that the codebase is ready for `Gate X`. +3. Stop executing. Wait for the GateKeeper to run the validation commands. +4. If the GateKeeper returns an error (Exit Code != 0), read the raw log it provides. +5. Perform a root cause analysis, fix the code, and resubmit to the GateKeeper. -Each gate: execute command → check pass criteria → paste real output to `docs/[project]-verification.md`. Commands come from the domain profile. If no profile, define them in the Design doc. - -**"Assumed to pass" is never valid evidence. Paste real output or it didn't happen.** - -| Gate | When | What | Pass Criteria | -|------|------|------|---------------| -| Gate 0 | After dependency install | Install command | Exit 0, zero unresolved dependency errors | -| Gate 1 | After scaffold | Build/compile command | Exit 0, no errors, output artifacts exist | -| Gate 2 | After each feature phase | Build + existing tests | Build succeeds, no regressions | -| Gate 3 | After tests written | Full test suite | All tests pass, coverage meets target | -| Gate 4 | Before declaring done | Clean build from scratch | All gates pass from clean state | - -### On Failure - -1. Record failure output in the verification log (Failure History section) -2. Root cause analysis — don't fix symptoms, find the cause -3. Fix → re-run the SAME gate from scratch (not just the failing part) -4. Record passing output in the verification log -5. Update Progress section -6. Only then proceed to the next phase +**"Assumed to pass" is never valid evidence. If the GateKeeper hasn't stamped it, it is not verified.** ### For Domains Without Traditional Build Commands @@ -123,40 +112,56 @@ Some domains (PLC, hardware, documentation) lack command-line builds. The domain Domain profiles are the learning mechanism. They accumulate knowledge across projects with the same stack. They are the most valuable artifact in this framework. ### Loading -Use this deterministic selection protocol: -1. Build candidates from `agent-kit/domains/*.md`, excluding `_template.md` and `README.md`. -2. Read each candidate's selection metadata: `Profile ID`, `Match Keywords`, `Use When`, `Do Not Use When`. -3. Exclude profiles where any `Do Not Use When` condition matches explicit user constraints. -4. Score remaining profiles by keyword overlap with the prompt and declared stack (`+1` per matched keyword). -5. Select the highest score only if it is unique and `>= 2`. -6. If tied or below threshold, ask the human which profile to use. If no clarification is available, create a new profile from `agent-kit/domains/_template.md` instead of forcing a weak match. -7. Record the selected profile and matching rationale in the Design document. +Profiles in `framework/domains/` can be either **standalone** (full profile) or **links** (extend a base from `catalog/`). Use this deterministic protocol: -If a profile is selected, it overrides generic assumptions for: terminology, verification commands, pitfalls, integration rules, automated checks, and decision history. +1. Build candidates from `framework/domains/*.md`, excluding `_template.md` and `README.md`. +2. For each candidate, check if it has an `extends` field: + - **If `extends` exists:** load the base profile from `catalog/[extends].md` and read its selection metadata. + - **If no `extends`:** read the selection metadata directly from the profile. +3. Read each candidate's selection metadata: `Profile ID`, `Match Keywords`, `Use When`, `Do Not Use When`. +4. Exclude profiles where any `Do Not Use When` condition matches explicit user constraints. +5. Score remaining profiles by keyword overlap with the prompt and declared stack (`+1` per matched keyword). +6. Select the highest score only if it is unique and `>= 2`. +7. If tied or below threshold, ask the human which profile to use. If no clarification is available, create a new profile (see Creating below). +8. Record the selected profile and matching rationale in the Design document. + +**If the selected profile has `extends` (link), apply merge rules:** +- **Local Pitfalls** and **Local Adversary Questions** → append to the base lists +- **Local Overrides** → replace the corresponding base section +- **Local Decision History** → kept separate from base Decision History +- All other sections → inherit from base unchanged + +If the selected profile is standalone, use it directly. + +The selected profile overrides generic assumptions for: terminology, verification commands, pitfalls, integration rules, automated checks, and decision history. ### Creating -If no profile exists for the stack, create one using `agent-kit/domains/_template.md`. Minimum viable profile: Terminology Mapping + Verification Commands + at least 2 Common Pitfalls. +**When a matching base profile exists in `catalog/`:** create a profile link in `framework/domains/` using `framework/domains/_template.md` with the `extends` field pointing to the catalog profile. + +**When no base profile exists:** create a standalone full profile in `framework/domains/` using `framework/templates/DOMAIN_PROFILE-template.md`. Minimum viable profile: Terminology Mapping + Verification Commands + at least 2 Common Pitfalls. When the user later wants to reuse this profile across projects, they can move it to `catalog/` and replace it with a link. -### Updating +### Updating the Domain Profile (Dual Responsibility) -**Trigger: any code fix, gate failure, or new discovery — not just end-of-project.** +The Domain Profile is a shared knowledge structure, but the Builder and the GateKeeper update strictly separate knowledge vectors according to their nature. -When you fix a bug, discover a root cause, or learn something new about the stack during ANY task (Quick, Standard, or Full), update the domain profile **as part of the same change**. Don't defer to a final step. +**Builder Updates (Design-Time & Architectural Memory):** +- **Trigger:** Reading documentation, finding deprecated APIs, or making systemic architectural choices. +- **Integration rules:** New rules discovered while planning how libraries interact. +- **Decision History:** Decisions that should apply to ALL future projects with this stack. +- **Terminology:** New domain-specific concepts modeled. -What to update: -- New pitfalls discovered during implementation, debugging, or gate failures → add to Common Pitfalls with What/Correct/Detection -- Verification commands that needed adjustment → update Verification Commands -- Integration rules that were missing or incorrect → update Integration Rules -- Decisions that should apply to ALL future projects with this stack → add to Decision History with date, decision, context, and constraint -- New detection patterns → add to Automated Checks -- Review Checklist items that were missing → add to Review Checklist +**GateKeeper Updates (Runtime & Mechanical Memory):** +- **Trigger:** Gate failures, test regressions, build errors. +- **Common Pitfalls:** Bugs discovered during strict verification. +- **Verification Commands:** Commands that needed adjustment to pass the environment. +- **Automated Checks:** New bash detection patterns to strictly enforce. **The domain profile is a living document. Every bug fix that reveals a gap is a learning opportunity — capture it immediately or it's lost.** ## Artifacts ### Change Intent Record (`docs/[project]-intent.md`) -Captures WHY, expected BEHAVIOR, and CONSTRAINTS. Human-authored or extracted from prompt. Source of truth for scope and decisions. Template: `agent-kit/templates/INTENT.md` +Captures WHY, expected BEHAVIOR, and CONSTRAINTS. Human-authored or extracted from prompt. Source of truth for scope and decisions. Template: `framework/templates/INTENT.md` Key sections: - **Goal** — What and why (1-2 sentences) @@ -166,10 +171,10 @@ Key sections: - **Supersedes** — If this reworks an existing feature, reference the previous Intent. The old Intent remains as historical record but is no longer active ### Design Document (`docs/[project]-design.md`) -Architecture + decisions + risks + domain profile selection rationale in one document. Replaces separate PRD, Tech Spec, Review, and Implementation Plan. Template: `agent-kit/templates/DESIGN.md` +Architecture + decisions + risks + domain profile selection rationale in one document. Replaces separate PRD, Tech Spec, Review, and Implementation Plan. Template: `framework/templates/DESIGN.md` ### Verification Log (`docs/[project]-verification.md`) -Mechanical proof. Every gate execution with real output. Source of truth for "does it work?" and "where did we stop?". One file per project — completed logs remain as historical evidence. Template: `agent-kit/templates/VERIFICATION_LOG-template.md` +Mechanical proof. Every gate execution with real output. Source of truth for "does it work?" and "where did we stop?". One file per project — completed logs remain as historical evidence. Template: `framework/templates/VERIFICATION_LOG-template.md` Key sections: - **Progress** — Summary table at the top. Updated after every gate or phase transition. This is how interrupted work resumes — read Progress first, then continue from the last completed step. @@ -184,7 +189,7 @@ Key sections: After implementation, shift to Adversary Lens: 1. Re-read the Intent. Does the code deliver every Behavior described? Does it respect every Constraint? -2. Run the domain profile's Automated Checks (execute each command, verify results) +2. Run every Automated Check from the domain profile (execute command, verify result) 3. Check every Common Pitfall from the domain profile against the codebase 4. Verify every Review Checklist item 5. **For Full projects only:** add to the verification log: @@ -225,5 +230,6 @@ If no verification log exists, look for intent and design docs in `docs/` to und - When the human says "never X", that's a CONSTRAINT — capture it in the Intent AND propose a domain profile update if it applies stack-wide. - Don't create files that aren't needed. Prefer editing existing files. - Don't over-engineer. The right amount of complexity is the minimum needed for the current task. -- **Project code goes in its own directory** — not at the repository root. Name the directory after the project (e.g., `mcp-task-widget/`). Config files (`package.json`, `tsconfig.json`, `vite.config.ts`, etc.), source code, and build output belong inside this directory. The repo root is reserved for `AGENTS.md`, `agent-kit/`, and `docs/`. +- **Project code goes in its own directory** — not at the repository root. Name the directory after the project (e.g., `mcp-task-widget/`). Config files (`package.json`, `tsconfig.json`, `vite.config.ts`, etc.), source code, and build output belong inside this directory. The repo root is reserved for `AGENTS.md`, `framework/`, and `docs/`. - **Every bug fix must update the domain profile.** If you fix a problem caused by a missing pitfall, incorrect integration rule, or wrong assumption — add it to the domain profile in the same commit. A fix without a domain profile update means the same mistake can happen again in the next project. +- **Skills are guidance, not process.** A skill can inform how you design and implement (aesthetic direction, API conventions, documentation style) but cannot override gates, skip artifacts, or replace domain profile pitfalls. If a skill contradicts the domain profile, the profile wins for technical correctness; the skill wins for domain-specific quality. diff --git a/framework/GATEKEEPER.md b/framework/GATEKEEPER.md new file mode 100644 index 0000000..a28268e --- /dev/null +++ b/framework/GATEKEEPER.md @@ -0,0 +1,68 @@ +--- +name: gatekeeper +description: Executes verification commands, reads logs, and enforces the deterministic build flow. +--- + +# GateKeeper + +You are the strict Verification Agent. Your sole responsibility is to mechanically prove whether the implementation produced by the **Builder** works. You DO NOT write or modify implementation code. You represent the harsh reality of the runtime environment. + +## 1. Core Mandate + +**You are an execution environment interface, not a software developer.** +You exist to execute commands, read their outputs, and relay the truth back to the verification log. If code fails, you do not fix it yourself. You provide the raw failure log to the Builder. This separation of powers eliminates self-approving AI hallucinations. + +## 2. Strict Permissions & Restrictions + +- **ALLOWED:** You have absolute authority to run terminal commands (`bash`, `sh`). +- **ALLOWED:** You can read the entire codebase (`view_file`), specifically evaluating test files, build configurations, and the `docs/[project]-verification.md` file. +- **FORBIDDEN:** You **MUST NEVER** modify implementation code or configuration files. If you find a bug, your job is to report it, not to patch it. +- **FORBIDDEN:** You **MUST NEVER** alter tests to make them pass. + +## 3. The Verification Loop + +When the Builder signals that a phase is complete and requests a Gate check: + +1. **Read the Rules:** Check the `Domain Profile` for the specific verification commands required. +2. **Execute:** Run the exact commands mechanically in the terminal. +3. **Capture:** Capture the raw `STDOUT/STDERR` output and the Exit Code. +4. **Log:** Write the raw output into the `docs/[project]-verification.md` file under the appropriate Gate section. "Assumed to pass" is never valid evidence. Paste real output or it didn't happen. +5. **Report & Reject:** If the Exit Code is anything other than `0`, you must formally reject the Gate phase. Pass the entire error block back to the Builder context for analysis. + +## 4. Domain Profile Learning (The Flywheel) + +You are the guardian of **Runtime and Mechanical Memory**. While the Builder is permitted to update the Domain Profile with semantic design discoveries (Decision History, Integration Rules), you strictly update the profile when code crashes against the runtime environment: + +- When an implementation repeatedly fails a Gate due to an architectural misconception, a missing dependency, or a bad configuration, it is **your** exclusive responsibility to update the `Domain Profile` with a new `Pitfall` or a new `Verification Command`. +- The Builder generates code; the GateKeeper generates the strict barrier rules to govern future builds. +- By documenting mechanical failures in the Domain Profile, you ensure the next project on this stack inherits the hard-learned lesson. + +## 5. Gate Execution Matrix + +You mechanically apply the pass criteria for the Gates: + +| Gate | What to Verify | Pass Criteria | +|------|----------------|---------------| +| **Gate 0** | Dependencies | Command exits 0. Zero unresolved dependency warnings. | +| **Gate 1** | Scaffolding | Build/compile command exits 0. Expected default artifacts exist on disk. | +| **Gate 2** | Feature Phase | Build + tests. Exits 0. No existing tests regress. | +| **Gate 3** | Full Coverage| Full test suite executes cleanly. Coverage percentage meets targets if specified. | +| **Gate 4** | Clean Build | From-scratch clean install and build. Everything passes. | + +## 6. Communication with Builder + +When you communicate back to the Builder after a failed Gate, use a structured format: +``` +[GATE REJECTED] +Gate: +Exit Code: +Raw Output: + +Action Required: Builder must analyze the root cause and resubmit. +``` + +If it passes: +``` +[GATE PASSED] +The verification log has been updated with the mechanical proof. +``` diff --git a/agent-kit/README.md b/framework/README.md similarity index 63% rename from agent-kit/README.md rename to framework/README.md index 3490f50..b826ea0 100644 --- a/agent-kit/README.md +++ b/framework/README.md @@ -5,33 +5,40 @@ | File | Purpose | |------|---------| | `BUILDER.md` | Process contract — lenses, gates, pre-implementation checkpoint, rules | -| `domains/_template.md` | Template for creating new domain profiles | -| `domains/*.md` | Domain profiles — accumulated knowledge per technology stack | +| `GATEKEEPER.md` | Verification agent — executes commands, enforces gates, updates pitfalls | +| `domains/_template.md` | Template for creating profile links that extend catalog profiles | +| `domains/README.md` | How domain profiles work — types, matching, merge rules | +| `domains/*.md` | Domain profiles — standalone or links extending catalog base profiles | | `templates/INTENT.md` | Template for Change Intent Records | | `templates/DESIGN.md` | Template for Design documents | | `templates/VERIFICATION_LOG-template.md` | Template for gate execution logs with progress tracking | +| `templates/DOMAIN_PROFILE-template.md` | Template for creating standalone full domain profiles | ## Setup -1. Copy `agent-kit/` into the target repo. +1. Copy `framework/` into the target repo. 2. Create `AGENTS.md` at the repo root: ```markdown # Agent Instructions - Read and follow `agent-kit/BUILDER.md` for all tasks. + The framework operates on a strict Adversarial Verification Loop: + - For design, planning, and code implementation tasks: Read and follow `framework/BUILDER.md`. + - For execution, testing, and mechanical verification tasks: Read and follow `framework/GATEKEEPER.md`. Project artifacts (intent, design, verification) go in `docs/`. ``` 3. Create `docs/` for project-specific artifacts. -4. Either bring an existing domain profile or let the Builder create one during the first project. +4. Create a profile link in `framework/domains/` that extends a relevant profile from the [catalog](../catalog/), or let the Builder create one during the first project. See `domains/_template.md` for the link format. Resulting structure: ``` repo/ ├── AGENTS.md ← entry point (5 lines) -├── agent-kit/ ← framework (reusable) +├── framework/ ← framework (reusable) │ ├── BUILDER.md +│ ├── GATEKEEPER.md +│ ├── VERSION │ ├── domains/ │ └── templates/ ├── docs/ ← project artifacts (intent, design, verification) @@ -60,6 +67,8 @@ Project code goes in its own directory — never at the repo root. | Standard | Feature-sized | Intent + Design + Code + Verification Log | | Full | New project / major rearchitecture | Standard + ADRs + Devil's Advocate self-review | +All sizes load relevant skills when they exist (see Skills section below). For Full, skill loading is mandatory — document that none were found if that's the case. + Quick escalates to Standard if it touches > 3 files or uncovers bugs beyond the original scope. When in doubt, the Builder goes one size up. @@ -125,6 +134,7 @@ Architecture + decisions + risks. Template: `templates/DESIGN.md` | Section | Content | |---------|---------| | Domain Profile Selection Rationale | Candidates, scores, exclusions, selected profile | +| Skills Loaded | Skills matched and loaded, or "none" if no skills apply | | Stack | Technologies with verified versions | | Structure | Code organization | | Data Flow | How data moves, especially across technology boundaries | @@ -155,13 +165,19 @@ One log per project. Completed logs remain as historical evidence. ## Domain profile selection contract -1. Build candidates from `agent-kit/domains/*.md`, excluding `_template.md` and `README.md` -2. Read each candidate's metadata: `Profile ID`, `Match Keywords`, `Use When`, `Do Not Use When` -3. Exclude profiles where `Do Not Use When` matches explicit user constraints -4. Score remaining profiles by keyword overlap with prompt/stack (`+1` per keyword hit) -5. Select only if highest score is unique and `>= 2` -6. If tied or below threshold: ask the human. If no clarification, create a new profile from `_template.md` -7. Record selected profile and reason in the Design document +Profiles in `framework/domains/` can be standalone or links that extend base profiles from `catalog/`: + +1. Build candidates from `framework/domains/*.md`, excluding `_template.md` and `README.md` +2. For each candidate, check for an `extends` field: + - If `extends` exists: load the base profile's metadata from `catalog/` + - If no `extends`: read the metadata directly from the profile +3. Read selection metadata: `Profile ID`, `Match Keywords`, `Use When`, `Do Not Use When` +4. Exclude profiles where `Do Not Use When` matches explicit user constraints +5. Score remaining profiles by keyword overlap with prompt/stack (`+1` per keyword hit) +6. Select only if highest score is unique and `>= 2` +7. If tied or below threshold: ask the human. If no clarification, create a new profile (see Creating) +8. If selected profile has `extends`: merge base + local pitfalls (appended) + local overrides (replaced) + local decisions (separate). If standalone: use directly +9. Record selected profile and reason in the Design document ## Domain profile sections @@ -180,9 +196,9 @@ One log per project. Completed logs remain as historical evidence. ### Creating a new profile -When no profile matches, the Builder creates one from `domains/_template.md` during the Design phase. +When a matching base profile exists in `catalog/`, the Builder creates a profile link in `domains/` using `domains/_template.md` with the `extends` field pointing to the catalog profile. -Minimum viable profile: Terminology Mapping + Verification Commands + 2 Common Pitfalls. +When no base profile exists, the Builder creates a standalone full profile in `domains/` using `templates/DOMAIN_PROFILE-template.md`. Minimum viable profile: Terminology Mapping + Verification Commands + 2 Common Pitfalls. When the profile proves reusable across projects, move it to `catalog/` and replace it with a link. ### Updating a profile @@ -190,7 +206,9 @@ Minimum viable profile: Terminology Mapping + Verification Commands + 2 Common P Updates happen immediately, as part of the same change that discovered the gap. A fix without a profile update means the same mistake can happen again. -What to update: +**Where updates go** depends on their scope: + +**Stack-wide discoveries → base profile in `catalog/`:** - New pitfalls → Common Pitfalls (What/Correct/Detection) - Adjusted commands → Verification Commands - Missing or incorrect rules → Integration Rules @@ -198,6 +216,11 @@ What to update: - New detection patterns → Automated Checks - Missing review items → Review Checklist +**Project-specific discoveries → profile link in `framework/domains/`:** +- Pitfalls unique to this project's context → Local Pitfalls +- Gate overrides for this project → Local Overrides +- Decisions that only apply here → Local Decision History + ## Resuming interrupted work 1. Read `AGENTS.md` → `BUILDER.md` @@ -208,6 +231,36 @@ What to update: If no verification log exists, look for intent and design docs in `docs/`. If nothing exists, start fresh. +## Skills + +Skills are optional guidance documents that inform design and implementation quality without replacing the framework process or domain profiles. + +### Where skills live + +| Location | Scope | Example | +|----------|-------|---------| +| `.github/skills/` | Repo-level — workflow and composition rules specific to this repository | `repo-frontend-workflow` | +| `.agents/skills/` | Agent-level — external skills installed via tools like [skills.sh](https://skills.sh) | `frontend-design` | + +Each skill is a `SKILL.md` file with a `name` and `description` in its frontmatter. + +### When skills are loaded + +During Standard/Full step 3 (after domain profile, before design), the Builder scans both directories, reads each skill's `description`, and loads skills that match the task domain. For Quick tasks, a skill is loaded only if one clearly matches. Skills are recorded in the Design document. + +### Boundary rules + +- Skills inform quality (aesthetic direction, API conventions, documentation style) +- Skills cannot override gates, skip artifacts, or replace domain profile correctness +- If a skill contradicts the domain profile, the profile wins for technical correctness; the skill wins for domain-specific quality +- Technical learnings (build failures, pitfalls, integration rules) go in the domain profile, not in skills +- Process learnings (how this repo executes tasks) go in repo-level skills +- Project-specific learnings go in `docs/` + +### Skills and domain profiles are independent + +Domain profiles do not declare which skills to use. Skills do not declare which profiles to select. The Builder evaluates each independently based on the task. This keeps both systems composable — a profile works with any combination of skills, and a skill works with any profile. + ## Self-review protocol After implementation, the Builder shifts to Adversary Lens: diff --git a/framework/VERSION b/framework/VERSION new file mode 100644 index 0000000..3eefcb9 --- /dev/null +++ b/framework/VERSION @@ -0,0 +1 @@ +1.0.0 diff --git a/framework/domains/README.md b/framework/domains/README.md new file mode 100644 index 0000000..2255ea9 --- /dev/null +++ b/framework/domains/README.md @@ -0,0 +1,113 @@ +# Domain Profiles + +Domain profiles provide stack-specific knowledge that the Builder loads based on the project's technology stack. They are the framework's learning mechanism — each project can enrich them with new pitfalls, decisions, and automated checks. + +## Two Types of Profile + +This directory can contain two types of profile: + +**Standalone profiles** — full profiles with all sections (Selection Metadata, Terminology, Verification Commands, Pitfalls, etc.). Created when no base profile exists in `catalog/`. Template: `framework/templates/DOMAIN_PROFILE-template.md`. + +**Profile links** — lightweight files that extend a base profile from `catalog/` with project-specific additions. Created when a matching base exists in `catalog/`. Template: `_template.md` in this directory. + +``` +framework/domains/ ← Profiles for this project +├── _template.md ← Template for profile links +├── README.md ← This file +├── web-react-nextjs.md ← Example: standalone (no catalog base) +└── my-widget.md ← Example: link (extends a catalog base) +``` + +## How It Works + +### When a catalog profile exists for your stack + +1. Create a profile link in this directory using `_template.md` +2. Set `extends:` to the Profile ID of a catalog profile +3. Add only project-specific pitfalls, overrides, and decisions +4. The Builder loads the base from `catalog/` and applies your local additions on top + +**Resolution rules for links:** +- **Local Pitfalls** and **Local Adversary Questions** are **appended** to the base profile's lists +- **Local Overrides** **replace** the corresponding section in the base profile +- **Local Decision History** is kept separate from the base profile's Decision History +- Sections not mentioned locally inherit from the base unchanged + +### When no catalog profile exists + +1. Create a standalone profile in this directory using `framework/templates/DOMAIN_PROFILE-template.md` +2. Fill in at minimum: Terminology Mapping + Verification Commands + 2 Common Pitfalls +3. The profile grows as the project discovers new pitfalls and decisions +4. When you want to reuse it across projects, move it to `catalog/` and replace it here with a link + +### Example: Profile Link + +```markdown +# Domain Profile: My Widget App + +## Profile Link + +extends: apps-sdk-mcp-lit-vite +catalog_version: 1.0.0 + +## Local Pitfalls + +### Pitfall L1: Auth token expiry during render +- **What goes wrong:** Widget crashes if token expires between init and first paint +- **Correct approach:** Check token TTL before render, refresh if < 30s remaining +- **Detection:** `grep -r "getToken" src/ | grep -v "checkExpiry"` + +## Local Overrides + +### Verification Commands (overrides) + +**GATE 3 (Tests):** +- Command: `npm test -- --project=widget` +- Expected output: all widget tests pass +``` + +This profile inherits all 11 pitfalls, 7 adversary questions, terminology, integration rules, and automated checks from `apps-sdk-mcp-lit-vite` — and adds one local pitfall plus a gate override. + +## How Profiles Are Used + +1. The Builder reads the user's prompt and identifies the technology stack +2. The Builder applies the operational matching contract (see below) +3. If the selected profile has `extends`: + - Loads the base profile from `catalog/[extends].md` + - Applies local overrides and additions + - Uses the merged result for the project +4. If the selected profile is standalone, uses it directly +5. The profile overrides generic assumptions: + - **Verification commands** replace generic "run build" instructions + - **Common pitfalls** (base + local, or standalone) are checked against the design before coding + - **Adversary questions** (base + local, or standalone) must be answered against the specific design + - **Automated checks** are executed mechanically during self-review + - **Decision history** applies permanent constraints from past projects + - **Integration rules** define data flow between technologies + +## Operational Matching Contract + +Selection must be deterministic and auditable: + +1. Candidate set: all `.md` files in this directory except `_template.md` and `README.md` +2. For each candidate, check for an `extends` field: + - If `extends` exists: load the base profile's selection metadata from `catalog/` + - If no `extends`: read the selection metadata directly from the profile +3. Read selection metadata: `Profile ID`, `Match Keywords`, `Use When`, `Do Not Use When` +4. Remove any profile whose `Do Not Use When` matches explicit user constraints +5. Score remaining profiles by keyword overlap with prompt/stack (`+1` per keyword hit) +6. Select only if highest score is unique and `>= 2` +7. If tied or below threshold: ask the human to choose; if no clarification is available, create a new profile (see below) +8. Record the selected profile and reason in the project design doc + +## Keeping Profiles Updated + +- For profile links: compare `catalog_version` against the current catalog profile to detect updates +- **Local pitfalls** that prove useful across projects should be contributed back to the catalog profile via PR +- **Stack-wide decisions** discovered locally belong in the catalog profile's Decision History, not in the local one +- Standalone profiles that prove reusable should be moved to `catalog/` and replaced here with a link + +## Who Creates Profiles + +1. **The human (proactively)** — Create profiles for your project's stack using the appropriate template +2. **The Builder (during a project)** — When a matching catalog profile exists, creates a profile link using `_template.md`. When no catalog profile exists, creates a standalone profile using `framework/templates/DOMAIN_PROFILE-template.md` diff --git a/framework/domains/_template.md b/framework/domains/_template.md new file mode 100644 index 0000000..b9b22c2 --- /dev/null +++ b/framework/domains/_template.md @@ -0,0 +1,53 @@ +# Domain Profile: [Project Name] + +## Profile Link + +```yaml +extends: [profile-id] # Profile ID from catalog/ (e.g., apps-sdk-mcp-lit-vite) +catalog_version: 1.0.0 # Version of the catalog profile when this link was created +``` + +> **How it works:** The Builder loads the base profile from `catalog/[extends].md` and applies the local sections below on top. Base sections not overridden here remain active. To check if your base is outdated, compare `catalog_version` against the current profile. + +--- + +## Local Pitfalls + +Project-specific pitfalls discovered during implementation. These do NOT exist in the base profile — they are unique to this project's context. + +### Pitfall L1: [Name] +- **What goes wrong:** [Description] +- **Correct approach:** [How to do it right] +- **Detection:** [Search pattern or verification step] + +*(Add as many as needed. Each one is a candidate for contributing back to the catalog profile.)* + +## Local Adversary Questions + +Project-specific questions that the base profile's adversary questions don't cover. + +- [e.g., "What happens if our auth token expires mid-widget-render?"] + +## Local Overrides + +Sections here **replace** the corresponding section in the base profile. Only override what differs for this project — omit sections where the base is correct. + +### Verification Commands (overrides) + +Only include gates that differ from the base profile. + +**GATE 3 (Tests):** +- Command: `[e.g., npm test -- --project=widget]` +- Expected output: [what success looks like] + +### Integration Rules (overrides) + +[Only if this project has integration patterns that differ from the base profile] + +## Local Decision History + +Decisions specific to THIS project (not stack-wide). Stack-wide decisions belong in the catalog profile. + +| Date | Decision | Context | Constraint | +|------|----------|---------|------------| +| [e.g., 2026-03-20] | [e.g., Skip server tests in CI] | [Why] | [MUST/MUST NOT] | diff --git a/agent-kit/templates/DESIGN.md b/framework/templates/DESIGN.md similarity index 93% rename from agent-kit/templates/DESIGN.md rename to framework/templates/DESIGN.md index f6e2b3a..3c31b14 100644 --- a/agent-kit/templates/DESIGN.md +++ b/framework/templates/DESIGN.md @@ -1,7 +1,7 @@ # Design: [Project Name] **Intent:** docs/[project]-intent.md -**Domain Profile:** agent-kit/domains/[profile].md +**Domain Profile:** framework/domains/[profile].md **Date:** [Date] ## Domain Profile Selection Rationale @@ -16,6 +16,14 @@ Document why this profile was selected so routing is auditable and repeatable. **Selected Profile:** [profile id] **Selection Basis:** [Unique highest score >= 2, or human-selected after tie] +## Skills Loaded + +| Skill | Location | Why loaded | +|-------|----------|------------| +| [e.g., frontend-design] | [.agents/skills/frontend-design/] | [Task involves UI — skill description matches] | + +[If no skills exist or none match, write "None — no matching skills found."] + ## Architecture ### Stack diff --git a/agent-kit/domains/_template.md b/framework/templates/DOMAIN_PROFILE-template.md similarity index 100% rename from agent-kit/domains/_template.md rename to framework/templates/DOMAIN_PROFILE-template.md diff --git a/agent-kit/templates/INTENT.md b/framework/templates/INTENT.md similarity index 97% rename from agent-kit/templates/INTENT.md rename to framework/templates/INTENT.md index 64bd315..d515c5e 100644 --- a/agent-kit/templates/INTENT.md +++ b/framework/templates/INTENT.md @@ -2,7 +2,7 @@ **Date:** [Date] **Size:** [Quick / Standard / Full] -**Domain Profile:** [Reference to agent-kit/domains/ profile, or "New — will be created"] +**Domain Profile:** [Reference to framework/domains/ profile, or "New — will be created"] **Supersedes:** [Reference to previous Intent if this reworks an existing feature, or "—"] ## Goal diff --git a/agent-kit/templates/VERIFICATION_LOG-template.md b/framework/templates/VERIFICATION_LOG-template.md similarity index 99% rename from agent-kit/templates/VERIFICATION_LOG-template.md rename to framework/templates/VERIFICATION_LOG-template.md index 70a9a63..67b0e5d 100644 --- a/agent-kit/templates/VERIFICATION_LOG-template.md +++ b/framework/templates/VERIFICATION_LOG-template.md @@ -14,6 +14,7 @@ This log captures the actual output of every verification gate. It is the source | Step | Status | |------|--------| | Intent | — | +| Skills loaded | — | | Design | — | | Gate 0: Dependencies | — | | Gate 1: Scaffold | — |