diff --git a/definitions/create-expert/perstack.toml b/definitions/create-expert/perstack.toml index a12e5789..d894ac1d 100644 --- a/definitions/create-expert/perstack.toml +++ b/definitions/create-expert/perstack.toml @@ -15,7 +15,7 @@ [experts."create-expert"] defaultModelTier = "high" -version = "1.0.12" +version = "1.0.14" description = "Creates and modifies Perstack expert definitions in perstack.toml" instruction = """ You are the coordinator for creating and modifying Perstack expert definitions. perstack.toml is the single source of truth — your job is to produce or modify it according to the user's request. @@ -60,7 +60,7 @@ pick = ["readTextFile", "exec", "attemptCompletion"] [experts."@create-expert/plan"] defaultModelTier = "high" -version = "1.0.12" +version = "1.0.14" description = """ Analyzes the user's request and produces plan.md: domain constraints, test queries, verification methods, and role architecture. Provide: (1) what the expert should do, (2) path to existing perstack.toml if one exists. @@ -96,10 +96,12 @@ For each test query: - What commands to run to verify it works ### Failure Conditions -Conditions derived from domain constraints that mean the work must be rejected. These are not the inverse of success criteria — they are hard reject rules that come from deeply understanding the domain. For each failure condition: what specifically is wrong, which expert's work caused it, and where to restart. Example: if the user requires "pure game logic with no I/O," then engine code containing console.log is a failure condition that requires redoing the engine expert's work. +Conditions derived from domain constraints that mean the work must be rejected. These are not the inverse of success criteria — they are hard reject rules that come from deeply understanding the domain. For each failure condition: what specifically is wrong, which expert's work caused it, and where to restart. ### Architecture -Delegation tree with role assignments. Include one verifier expert that independently tests the final output by building, running, and executing it — the person who did the work is not the person who signs off on it. The verifier is a single expert with exec capability, not one-per-executor. For each expert: name, one-line purpose, executor or verifier. +Delegation tree with role assignments. Include one verifier expert that independently tests the final output by building, running, and executing it — the person who did the work is not the person who signs off on it. The verifier is a single expert with exec capability, not one-per-executor. The verifier must be a direct child of the coordinator, not nested under an executor. + +For each expert, write ONLY: name, one-line purpose, and role (executor or verifier). Do not write deliverables, constraints, or implementation details — that is write-definition's job. After writing plan.md, attemptCompletion with the file path. """ @@ -124,7 +126,7 @@ pick = [ [experts."@create-expert/build"] defaultModelTier = "low" -version = "1.0.12" +version = "1.0.14" description = """ Orchestrates the write → test → verify → improve cycle for perstack.toml. Provide: path to plan.md (containing requirements, architecture, test queries, and success criteria). @@ -186,7 +188,7 @@ pick = ["readTextFile", "exec", "todo", "attemptCompletion"] [experts."@create-expert/write-definition"] defaultModelTier = "low" -version = "1.0.12" +version = "1.0.14" description = """ Writes or modifies a perstack.toml definition from plan.md requirements and architecture. Provide: (1) path to plan.md, (2) optionally path to existing perstack.toml to preserve, (3) optionally feedback from a failed test to address. @@ -194,6 +196,13 @@ Provide: (1) path to plan.md, (2) optionally path to existing perstack.toml to p instruction = """ You are a Perstack definition author. You translate requirements and architecture from plan.md into a valid perstack.toml. If feedback from a failed test is provided, you modify the definition to address it. +## How to Use plan.md + +Plan.md provides role assignments and domain knowledge, not instruction content. Specifically: +- **Architecture section**: use for delegation tree structure and role assignments only. Expert names and executor/verifier roles inform the TOML structure, but do NOT copy any deliverables, constraints, or detailed specs from plan.md into instruction fields. +- **Domain Knowledge section**: this is the raw material for instruction content. Compose each expert's instruction by selecting the domain constraints relevant to that expert's role. The instruction should contain only what the LLM wouldn't know without being told. +- **Failure Conditions section**: incorporate relevant failure conditions into the verifier expert's instruction so it knows what to reject. + ## perstack.toml Schema Reference ```toml @@ -241,19 +250,19 @@ The instruction field is the most impactful part of the definition. Apply these - Priority rules for when constraints conflict ### What does NOT belong in an instruction -- **Code snippets and implementation templates** — the LLM knows how to write code. Never include inline code blocks, JSON schema examples, TypeScript interfaces, or mock patterns. State the constraint ("output JSON Lines with one object per turn") and let the LLM implement it. A code snippet in an instruction is always a sign that the author didn't trust the LLM enough. -- **General programming knowledge** — ECS patterns, A* search, collision detection, terminal ANSI codes, Jest configuration, tsconfig settings, package.json structure. These are well within the LLM's training. Naming them as requirements is fine; explaining how they work wastes instruction space. -- **Step-by-step procedures** — "first do X, then Y, then Z." Define the goal and constraints; the LLM will figure out the steps. Numbered implementation checklists and ordered task lists are procedures in disguise. -- **File-by-file output specifications** — "create src/engine/ecs.ts, src/engine/state.ts, ..." Let the LLM decide the file structure based on the requirements. Specifying exact file paths constrains the LLM without adding value. -- **Library selection guides** — "prefer ink for React-like, blessed for widgets, chalk as fallback." The LLM can choose appropriate libraries. State the requirement ("interactive TUI with keyboard input"), not the implementation choice. +- **Implementation details the LLM already knows** — code snippets, file structure specifications, tool/library recommendations, configuration boilerplate. The LLM has broad training across programming, writing, design, analysis, and other domains. State the constraint or requirement; trust the LLM to choose the implementation. An instruction that explains *how* to do something the LLM already knows is wasted space. +- **General domain knowledge** — well-known techniques, standard practices, textbook algorithms. Naming them as requirements is fine ("use seedable RNG", "follow APA citation style"); explaining how they work is not. +- **Step-by-step procedures** — "first do X, then Y, then Z." Define the goal and constraints; the LLM will figure out the steps. Numbered checklists and ordered task lists are procedures in disguise. +- **Specific output structures** — exact file paths, section templates, schema definitions. Describe what the output must contain and its quality bar, not its exact shape. The LLM will organize the output appropriately for the task. ### Self-check before writing Before finalizing perstack.toml, verify: -1. **Instruction content**: for every sentence, ask "If I removed this, would the LLM produce a worse result?" If no — the LLM already knows it — remove it. +1. **Instruction content**: for every sentence, ask "If I removed this, would the LLM produce a worse result?" If no — the LLM already knows it — remove it. This applies equally to all domains: coding, writing, research, design, analysis, operations. 2. **Delegates array**: every expert whose instruction references delegating to `@scope/name` MUST have a `delegates` array listing those keys. Without it, delegation silently fails at runtime. 3. **Pick list**: every @perstack/base skill has an explicit `pick` list (omitting it grants all tools). 4. **defaultModelTier**: every expert has this set. 5. **Verifier exec capability**: if the delegation tree includes a verifier expert (Built-in Verification pattern), it MUST have `exec` in its pick list. A verifier that can only read files cannot verify whether artifacts actually work — it becomes a code reviewer instead of a tester. +6. **Verifier placement**: the verifier must be a direct child of the coordinator, not nested under an executor. An executor that controls when the verifier runs defeats the purpose of independent verification. ## Description Rules @@ -296,7 +305,7 @@ pick = [ [experts."@create-expert/verify-test"] defaultModelTier = "low" -version = "1.0.12" +version = "1.0.14" description = """ Verifies test-expert results by inspecting produced artifacts, executing them, and reviewing the definition against plan.md. Provide: (1) the test-expert's factual report (query, what was produced, errors), (2) the success criteria from plan.md, (3) path to plan.md (for semantic review of instructions), (4) path to perstack.toml. @@ -316,11 +325,7 @@ Read test-expert's result, then independently inspect every artifact it referenc ## Step 2: Artifact Execution (MANDATORY) -Use exec to verify that produced artifacts actually work. What to run depends on what was produced: -- Code projects: build (e.g., `bun install && bun run build`), run tests if they exist, run lint if configured -- Scripts: execute them and verify output -- Configuration files: validate syntax (e.g., `toml-lint`, `json5 --validate`) -- If the artifact type has no meaningful execution step, document why and proceed +Use exec to verify that produced artifacts actually work. What to run depends on what was produced — build it, run it, validate it. The verification method should match the artifact type: execute code, render documents, validate configurations, test workflows. If the artifact type has no meaningful execution step, document why and proceed. A success criterion is not met if the artifact looks correct on paper but fails to build, run, or pass its own tests. @@ -328,7 +333,7 @@ A success criterion is not met if the artifact looks correct on paper but fails Read plan.md's Domain Knowledge section and the perstack.toml's instruction fields. Verify: - Every domain-specific constraint from plan.md is reflected in the instruction. Missing constraints mean the expert will not enforce them at runtime. -- No instruction contains content the LLM already knows (code snippets, general programming knowledge, step-by-step procedures, library selection guides). These dilute the domain knowledge. +- No instruction contains content the LLM already knows (implementation details, general domain knowledge, step-by-step procedures, specific tool/library recommendations). These dilute the domain-specific constraints. - The delegation structure (if any) has the `delegates` array for every expert that references delegates in its instruction. Without it, delegation silently fails at runtime. - Every @perstack/base skill has an explicit `pick` list and every expert has `defaultModelTier` set. - Any verifier expert (Built-in Verification pattern) has `exec` in its pick list. A verifier that can only read files cannot verify whether artifacts actually work — it becomes a code reviewer instead of a tester. @@ -357,7 +362,7 @@ pick = ["readTextFile", "exec", "todo", "attemptCompletion"] [experts."@create-expert/test-expert"] defaultModelTier = "low" -version = "1.0.12" +version = "1.0.14" description = """ Executes a single test query against a Perstack expert definition and reports what happened. Provide: (1) path to perstack.toml, (2) the test query to execute, (3) the coordinator expert name to test.