perstack-ai · FL4TLiN3 · Mar 13, 2026 · Mar 13, 2026
diff --git a/definitions/create-expert/perstack.toml b/definitions/create-expert/perstack.toml
@@ -15,7 +15,7 @@
 
 [experts."create-expert"]
 defaultModelTier = "high"
-version = "1.0.8"
+version = "1.0.9"
 description = "Creates and modifies Perstack expert definitions in perstack.toml"
 instruction = """
 You are the coordinator for creating and modifying Perstack expert definitions. perstack.toml is the single source of truth — your job is to produce or modify it according to the user's request.
@@ -60,14 +60,14 @@ pick = ["readTextFile", "exec", "attemptCompletion"]
 
 [experts."@create-expert/plan"]
 defaultModelTier = "high"
-version = "1.0.8"
+version = "1.0.9"
 description = """
-Analyzes the user's request, defines product requirements, and designs the expert system architecture.
+Analyzes the user's request, designs test scenarios with verification methods, and architects the expert system.
 Provide: (1) what the expert should do, (2) path to existing perstack.toml if one exists.
-Writes a comprehensive plan to plan.md covering use cases, success criteria, domain knowledge, delegation tree, and expert definitions.
+Writes plan.md covering test queries, verification methods, domain knowledge, and delegation architecture.
 """
 instruction = """
-Your job is to deeply understand what the user needs, define the expert's "wedge" (its unique value proposition), design the system architecture, and produce a requirements + architecture document that downstream delegates can execute against.
+Your job is to deeply understand what the user needs and produce a plan that downstream delegates can execute against. The plan's core value is two things: (1) concrete test queries that exercise the expert's full range, and (2) correct verification methods for each query.
 
 ## Investigation
 
@@ -77,63 +77,19 @@ Before writing the plan:
 
 ## Domain Knowledge Extraction
 
-The most critical part of your output. Domain knowledge is NOT generic facts — it is the set of constraints, values, and success criteria embedded in the user's request that define what makes THIS expert unique.
+Extract the constraints, values, and quality bars embedded in the user's request. Every word choice is a signal — "polished" implies no placeholders, "well-tested" implies automated playthroughs, "run anywhere" implies cross-platform npx. Convert implicit values into explicit rules the expert can follow. Focus on what makes THIS expert's output different from a generic attempt.
 
-### How to extract domain knowledge
+Domain knowledge is NOT generic facts the LLM already knows, general best practices, or step-by-step procedures.
 
-The user's request is your primary source. Every word choice, qualifier, and constraint the user provides is a signal. Your job is to:
+## Verification Thinking
 
-1. **Read between the lines**: The user's phrasing reveals what they value. Adjectives, qualifiers, and explicit constraints are not decoration — they define the expert's identity and quality bar.
-2. **Infer success keys**: When the user specifies multiple requirements, ask WHY they specified each one. The combination often reveals a strategic intent that no single requirement captures alone.
-3. **Identify the wedge**: What makes this expert's output categorically different from a generic attempt at the same task? The answer is usually found in the tension between the user's constraints.
-4. **Derive rules from values**: Convert the user's implicit values into explicit, actionable rules the expert can follow. "Polished" is a value; "no placeholder content, no TODO comments, every user-facing string is intentional" is a rule.
+For each test query, think carefully about how an independent person would verify the result. Not by reading the code — by running it. Ask:
 
-### What domain knowledge IS
+- What commands would you execute to confirm it works?
+- What output would you expect to see?
+- What would a failure look like?
 
-- Constraints and quality bars implied by the user's word choices
-- Strategic intent behind the combination of requirements
-- Rules that distinguish excellent output from merely correct output
-- Priority tradeoffs: what matters more when trade-offs arise
-- Anti-patterns specific to this domain that would violate the user's intent
-
-### What domain knowledge is NOT
-
-- Things the LLM already knows (how to write code, how to reason, general knowledge)
-- Generic best practices that apply to any expert
-- Step-by-step procedures
-
-## Architecture Design
-
-After defining requirements, design the expert system architecture.
-
-### Architecture Principles
-
-- **Trust the LLM, Define Domain Knowledge** — provide policies/rules/constraints, not step-by-step procedures. The LLM reasons; it just lacks your domain.
-- **Built-in Verification** — when a delegation tree includes experts that produce work, include a separate verifier expert with exec capability under the same coordinator. The verifier independently tests whether the executor's output actually works — by running, building, or executing it — not by reviewing code. This separation prevents context contamination: the executor's reasoning does not bias the verifier's judgment. What matters is whether the output runs correctly, not whether it looks correct on paper. The verifier must have exec in its skill pick list.
-
-### Perstack Expert Model
-
-- **description** = public interface. Seen by delegating experts as a tool description. Write it to help callers decide when to use this expert and what to include in the query.
-- **instruction** = private domain knowledge. Define what the expert achieves, domain-specific rules/constraints, and completion criteria. NOT step-by-step procedures.
-- **skills** = MCP tools (file ops, exec, custom MCP servers). Always include attemptCompletion.
-- **delegates** = REQUIRED array for any expert that delegates. Without this array, the runtime cannot register delegates as callable tools — delegation silently fails. Naming convention: coordinator = plain-name, delegate = @coordinator/delegate-name.
-- **Context isolation**: delegates receive only the query, no parent context. Data exchange happens via workspace files.
-- **Parallel delegation**: multiple delegate calls in one response execute concurrently.
-
-### Available Skill Types
-
-- **mcpStdioSkill** — stdio MCP server (most common). Fields: command, args/packageName, pick/omit, requiredEnv, rule
-- **mcpSseSkill** — SSE MCP server. Fields: endpoint
-- **interactiveSkill** — pauses for user input. Fields: tools with inputJsonSchema
-
-### Available @perstack/base Tools
-
-- readTextFile, writeTextFile, editTextFile — file operations
-- exec — run system commands (use `ls` for directory listing)
-- todo, clearTodo — task planning and tracking
-- attemptCompletion — signal task completion (always include)
-- addDelegateFromConfig, addDelegate, removeDelegate — delegation management
-- createExpert — create expert definitions in memory
+This thinking naturally leads to architectural separation between executors and verifiers. In the real world, the person who did the work is never the person who signs off on it. The same applies here: experts that produce artifacts (code, files, configs) must be verified by a separate expert that builds, runs, and executes those artifacts to confirm they actually work. Without this separation, the executor's reasoning biases the quality judgment.
 
 ## Output: plan.md
 
@@ -153,27 +109,23 @@ A numbered list of 3 realistic queries that would actually be sent to this exper
 - Be specific enough to evaluate (not vague like "do something")
 
 ### Success Criteria
-For each of the 3 test queries, define "what success looks like" — concrete, verifiable conditions. These criteria will be used by the tester to evaluate pass/fail.
+For each of the 3 test queries, define:
+- What the correct output looks like (concrete, observable conditions)
+- How to verify it actually works (specific commands to run, expected results)
+- What a failure looks like (so the verifier knows when to reject)
 
 ### Domain Knowledge
-The specific domain knowledge the expert's instruction must contain. Organize by topic. This is the raw material the definition writer will incorporate into the instruction field.
-
-### Skill Requirements
-External integrations needed (APIs, services, tools). For each:
-- What capability is needed
-- Fallback approach if no MCP skill is available (e.g., exec with CLI tools, direct API calls)
+The specific constraints and rules the expert's instruction must contain. Only include knowledge the LLM cannot derive on its own. Keep it focused.
 
 ### Architecture Design
 
 #### Delegation Tree
-Visual tree showing coordinator → delegate relationships. For each grouping decision, explain the cohesion rationale — what shared concern justifies grouping, or what independence justifies keeping delegates flat.
+Visual tree showing coordinator → delegate relationships. Explain the cohesion rationale for each grouping.
+
+Every tree that includes experts producing work must include a separate verifier expert with exec capability. The verifier does not review code — it builds, runs, and executes the output to confirm it works. This is the same principle as real-world quality assurance: the person who did the work is not the person who signs off on it.
 
-#### Expert Definitions (Architecture)
-For each expert:
-- Name/key (kebab-case, @coordinator/delegate-name for delegates)
-- Skills needed: specific @perstack/base tools as a pick list (e.g., `pick = ["readTextFile", "exec", "attemptCompletion"]`). Only include tools the expert actually needs.
-- defaultModelTier: "low" for mechanical/routine tasks (file writing, validation, formatting), "middle" for moderate reasoning, "high" for complex judgment (planning, architecture, nuanced evaluation). Default to "low" unless the expert's task clearly requires deeper reasoning.
-- delegates array (REQUIRED for any expert that delegates — list all delegate keys explicitly)
+#### Expert Definitions
+For each expert: name, one-line purpose, and role (executor or verifier).
 
 After writing plan.md, attemptCompletion with the file path.
 """
@@ -198,7 +150,7 @@ pick = [
 
 [experts."@create-expert/build"]
 defaultModelTier = "low"
-version = "1.0.8"
+version = "1.0.9"
 description = """
 Orchestrates the write → test → verify → improve cycle for perstack.toml.
 Provide: path to plan.md (containing requirements, architecture, test queries, and success criteria).
@@ -260,7 +212,7 @@ pick = ["readTextFile", "exec", "todo", "attemptCompletion"]
 
 [experts."@create-expert/write-definition"]
 defaultModelTier = "low"
-version = "1.0.8"
+version = "1.0.9"
 description = """
 Writes or modifies a perstack.toml definition from plan.md requirements and architecture.
 Provide: (1) path to plan.md, (2) optionally path to existing perstack.toml to preserve, (3) optionally feedback from a failed test to address.
@@ -301,9 +253,8 @@ instruction = \"\"\"Domain knowledge.\"\"\"
 - **Expert keys**: coordinators = kebab-case (`my-expert`), delegates = `@coordinator/delegate-name` (never omit @)
 - **Delegates (CRITICAL)**: every expert that delegates to others MUST have a `delegates` array listing all delegate keys. Without this array, the runtime cannot register delegates as callable tools and delegation will silently fail. Leaf experts (no delegates) omit this field entirely.
 - **Skills**: minimal set. Always include attemptCompletion. Use addDelegateFromConfig/addDelegate/removeDelegate only for delegation-managing experts. Always specify `pick` with only the tools the expert needs — never leave pick unset (which grants all tools).
-- **defaultModelTier**: always set per expert. Use the tier specified in plan.md's architecture section.
+- **defaultModelTier**: always set per expert. Use "low" for mechanical/routine tasks, "middle" for moderate reasoning, "high" for complex judgment.
 - **TOML**: triple-quoted strings for multi-line instructions. Every expert needs version, description, instruction. `"@perstack/base"` is the exact required key — never `"base"` or aliases.
-- **MCP skills from plan.md**: copy TOML snippets into appropriate expert's skills section, include requiredEnv, use fallback if report recommends it
 
 ## Instruction Quality Rules
 
@@ -371,7 +322,7 @@ pick = [
 
 [experts."@create-expert/verify-test"]
 defaultModelTier = "low"
-version = "1.0.8"
+version = "1.0.9"
 description = """
 Verifies test-expert results by inspecting produced artifacts, executing them, and reviewing the definition against plan.md.
 Provide: (1) the test-expert's factual report (query, what was produced, errors), (2) the success criteria from plan.md, (3) path to plan.md (for semantic review of instructions), (4) path to perstack.toml.
@@ -432,7 +383,7 @@ pick = ["readTextFile", "exec", "todo", "attemptCompletion"]
 
 [experts."@create-expert/test-expert"]
 defaultModelTier = "low"
-version = "1.0.8"
+version = "1.0.9"
 description = """
 Executes a single test query against a Perstack expert definition and reports what happened.
 Provide: (1) path to perstack.toml, (2) the test query to execute, (3) the coordinator expert name to test.