From edd4ec81e3cfc1b402eab1ef7e5912873df32570 Mon Sep 17 00:00:00 2001
From: ProfSynapse <jrosenbaum9689@gmail.com>
Date: Tue, 19 May 2026 17:26:29 -0400
Subject: [PATCH 1/3] docs: audit eval harness tool schemas vs production

Catalogues field-level drift between tests/eval/fixtures/tools.ts NEXUS_TOOLS
and current production getParameterSchema() returns. 8 of 11 fixture entries
drift; documents the v5.9.0 contentManager_replace hard-break and several
production-side tools missing from the fixture.

Filed under docs/research/ because docs/eval/ is gitignored.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/research/eval-harness-schema-audit.md | 212 +++++++++++++++++++++
 1 file changed, 212 insertions(+)
 create mode 100644 docs/research/eval-harness-schema-audit.md

diff --git a/docs/research/eval-harness-schema-audit.md b/docs/research/eval-harness-schema-audit.md
new file mode 100644
index 000000000..2a0ab0e57
--- /dev/null
+++ b/docs/research/eval-harness-schema-audit.md
@@ -0,0 +1,212 @@
+# Eval Harness Tool Schema Audit
+
+Audit of `tests/eval/fixtures/tools.ts` (`NEXUS_TOOLS` array) against current production `getParameterSchema()` returns. Produced under Task #10 of the `fix/eval-harness-cli-schema` branch.
+
+## Method
+
+For each fixture entry in `NEXUS_TOOLS`, located the corresponding production tool class under `src/agents/*/tools/*.ts`, read its `getParameterSchema()` method, and compared required fields, optional fields, types, enums, and descriptions. Production schemas are wrapped by `getMergedSchema()` which merges in `CommonParameters` (workspaceId/sessionId/memory/goal/constraints); for this audit, the toolSchema (pre-merge) is the unit of comparison since the harness fixture does not include the common parameters.
+
+## Summary Table
+
+| Tool name in fixture | Production class | Status | Drifted fields |
+|----------------------|------------------|--------|---------------|
+| `contentManager_read` | `ContentManager` `ReadTool` (read.ts) | match | 0 |
+| `contentManager_write` | `ContentManager` `WriteTool` (write.ts) | drift | 1 (missing `overwrite`) |
+| `contentManager_insert` | `ContentManager` `InsertTool` (insert.ts) | drift | 4 (`position`/`lineNumber` instead of `startLine`, different semantics) |
+| `contentManager_replace` | `ContentManager` `ReplaceTool` (replace.ts) | drift (HARD) | 4 (`search`/`replace` two-field vs `start`/`end`/`content` three-field anchor model) |
+| `storageManager_move` | `StorageManager` `MoveTool` (move.ts) | drift | 2 (`destination` vs `newPath`; missing `overwrite`) |
+| `storageManager_copy` | `StorageManager` `CopyTool` (copy.ts) | drift | 2 (`destination` vs `newPath`; missing `overwrite`) |
+| `storageManager_archive` | `StorageManager` `ArchiveTool` (archive.ts) | match | 0 |
+| `storageManager_createFolder` | `StorageManager` `CreateFolderTool` (createFolder.ts) | match | 0 |
+| `storageManager_list` | `StorageManager` `ListTool` (list.ts) | drift | 2 (`path` listed as required, missing `filter`) |
+| `searchManager_content` | `SearchManager` `SearchContentTool` (searchContent.ts) | drift | 4 (missing `semantic`/`includeContent`/`snippetLength`/`paths`) |
+| `searchManager_directory` | `SearchManager` `SearchDirectoryTool` (searchDirectory.ts) | drift | 5+ (`paths` should be required, missing `fileTypes`/`depth`/`pattern`/`dateRange`/`limit`/`includeContent`; `searchType` lacks enum) |
+
+11 entries audited: 3 match, 8 drift. The HARD drift on `contentManager_replace` is the v5.9.0 schema break called out in CLAUDE.md.
+
+## Per-Drift Detail
+
+### contentManager_write (drift, 1 field)
+
+Production at `src/agents/contentManager/tools/write.ts:174-196`:
+- `path` (string, required)
+- `content` (string, required)
+- `overwrite` (boolean, optional, default: false)
+
+Fixture at `tests/eval/fixtures/tools.ts:33-47`:
+- `path` (string, required)
+- `content` (string, required)
+
+Drift: `overwrite` is missing in fixture. Not breaking (defaults to false), but the harness LLM cannot exercise the overwrite path.
+
+---
+
+### contentManager_insert (drift, semantic redesign)
+
+Production at `src/agents/contentManager/tools/insert.ts:128-149`:
+- `path` (string, required)
+- `content` (string, required)
+- `startLine` (number, required) — line-based: `1` to prepend, `-1` to append, `N` to insert before line N
+
+Fixture at `tests/eval/fixtures/tools.ts:48-64`:
+- `path` (string, required)
+- `content` (string, required)
+- `position` (string, required) — generic "position" string
+- `lineNumber` (number, optional)
+
+Drift: Fixture uses a `position` string + optional `lineNumber` shape that does not exist in production. Production uses a single integer `startLine` with sentinel values (`1`, `-1`, `N`). Fixture would mislead the model into emitting a `position: "append"` shape that production rejects. From CLAUDE.md pin: `append`/`prepend` actions in executePrompts route to `insert` — same single-integer convention.
+
+---
+
+### contentManager_replace (drift, HARD — v5.9.0 break)
+
+Production at `src/agents/contentManager/tools/replace.ts:202-227`:
+- `path` (string, required)
+- `start` (string, required) — content-anchor opening line(s), must be globally unique
+- `end` (string, required) — content-anchor closing line(s), must be after `start`
+- `content` (string, required) — replacement text; empty string deletes the range
+
+Fixture at `tests/eval/fixtures/tools.ts:65-80`:
+- `path` (string, required)
+- `search` (string, required) — text to find
+- `replace` (string, required) — replacement text
+
+Drift: Production switched from search/replace semantics to pattern-anchored range replacement in v5.9.0 (per CLAUDE.md pin). The new model identifies a contiguous range using `start`/`end` line anchors and replaces it with `content`; line numbers are never required. The fixture's `search`/`replace` shape predates this break and bears no field-name overlap with production. This is the most severe drift in the fixture.
+
+Evidence: CLAUDE.md pinned context — "v5.9.0 — Pattern-anchored content replace (PR #206): hard schema break from `{path, oldContent, newContent, startLine, endLine}` to 4-field `{path, start, end, content}` on both `ContentManager.replace` and `executePrompts.replace`."
+
+---
+
+### storageManager_move (drift, 2 fields)
+
+Production at `src/agents/storageManager/tools/move.ts:93-117`:
+- `path` (string, required)
+- `newPath` (string, required)
+- `overwrite` (boolean, optional, default: false)
+
+Fixture at `tests/eval/fixtures/tools.ts:81-95`:
+- `path` (string, required)
+- `destination` (string, required)
+
+Drift: Field name is `newPath` in production, not `destination`. Fixture also omits `overwrite`. A model emitting `{ path, destination }` would have its destination argument silently dropped by production.
+
+---
+
+### storageManager_copy (drift, 2 fields)
+
+Production at `src/agents/storageManager/tools/copy.ts:84-107`:
+- `path` (string, required)
+- `newPath` (string, required)
+- `overwrite` (boolean, optional, default: false)
+
+Fixture at `tests/eval/fixtures/tools.ts:96-110`:
+- `path` (string, required)
+- `destination` (string, required)
+
+Drift: Same `destination` vs `newPath` mismatch as move. Same missing `overwrite`.
+
+---
+
+### storageManager_list (drift, 2 fields)
+
+Production at `src/agents/storageManager/tools/list.ts:167-185`:
+- `path` (string, optional, default: '') — empty string / `/` / `.` is vault root
+- `filter` (string, optional)
+- `required: []` — both fields optional
+
+Fixture at `tests/eval/fixtures/tools.ts:139-152`:
+- `path` (string, required)
+
+Drift: Production has `path` as optional with vault-root default; fixture marks it required. Fixture also missing the `filter` option. A model calling `storageManager_list` with no args is valid in production but rejected by the fixture schema.
+
+---
+
+### searchManager_content (drift, 4 fields)
+
+Production at `src/agents/searchManager/tools/searchContent.ts:472-517`:
+- `query` (string, required)
+- `semantic` (boolean, optional, default: false) — true for vector search
+- `limit` (number, optional, default: 10, min 1, max 50)
+- `includeContent` (boolean, optional, default: true)
+- `snippetLength` (number, optional, default: 200, min 50, max 1000)
+- `paths` (array of string, optional) — folder paths or glob patterns
+
+Fixture at `tests/eval/fixtures/tools.ts:153-167`:
+- `query` (string, required)
+- `limit` (number, optional)
+
+Drift: Fixture is missing `semantic`, `includeContent`, `snippetLength`, and `paths`. The `semantic` flag in particular is significant — production exposes both keyword and AI-powered semantic search through this one tool; the fixture only exposes the keyword path.
+
+---
+
+### searchManager_directory (drift, 5+ fields, plus required-list mismatch)
+
+Production at `src/agents/searchManager/tools/searchDirectory.ts:208-290`:
+- `query` (string, required, minLength 1)
+- `paths` (array of string, required, minItems 1)
+- `searchType` (string enum `'files'|'folders'|'both'`, optional, default: `'both'`)
+- `fileTypes` (array of string, optional)
+- `depth` (number, optional, 1–10)
+- `pattern` (string, optional) — regex filter
+- `dateRange` (object with start/end YYYY-MM-DD, optional)
+- `limit` (number, optional, default: 20, 1–100)
+- `includeContent` (boolean, optional, default: true)
+
+Fixture at `tests/eval/fixtures/tools.ts:168-183`:
+- `query` (string, required)
+- `paths` (array of string, optional) — listed in properties but NOT in required
+- `searchType` (string, optional) — no enum constraint
+
+Drift: (a) `paths` is required in production but listed as optional in the fixture — opposite required-set. (b) `searchType` lacks the `files|folders|both` enum in fixture. (c) Five fields missing from fixture (`fileTypes`, `depth`, `pattern`, `dateRange`, `limit`, `includeContent`).
+
+## Production-side tools NOT in the fixture
+
+The fixture covers `contentManager` (4 of 5 tools), `storageManager` (5 of 6 tools), and `searchManager` (2 of 3 tools). The following production tools have no fixture representation:
+
+| Production tool | Agent | Source |
+|-----------------|-------|--------|
+| `contentManager_setProperty` | ContentManager | `src/agents/contentManager/tools/setProperty.ts` — set frontmatter property, replace/merge modes |
+| `storageManager_open` | StorageManager | `src/agents/storageManager/tools/open.ts` — open file in Obsidian editor |
+| `searchManager_memory` | SearchManager | `src/agents/searchManager/tools/searchMemory.ts` — search memory traces / states / conversations |
+| `memoryManager_*` (full agent) | MemoryManager | createSession, loadSession, createWorkspace, createState, etc. |
+| `canvasManager_*` (full agent) | CanvasManager | read, write, update, list |
+| `taskManager_*` (full agent) | TaskManager | createProject, listProjects, createTask, listTasks, updateTask, moveTask, queryTasks, linkNote |
+| `promptManager_*` (full agent) | PromptManager | listModels, executePrompts, createPrompt, updatePrompt, deletePrompt, listPrompts, getPrompt, generateImage |
+| `ingestManager_*` (full agent) | IngestManager | ingest, listCapabilities |
+| App agents (webTools, composer) | apps/ | openWebpage, capturePagePdf, capturePagePng, captureToMarkdown, extractLinks, compose, listFormats |
+
+The Task #8 eval run showed the LLM calling `searchManager_memory`, `canvasManager_list`, etc. — names that exist in production but were rejected as hallucinations by the harness because they are not in the fixture. After the Task #11 schema swap, this concern goes away: the LLM will only see `getTools`/`useTools` and discover available tools dynamically.
+
+## Fixture-side tools NOT in production
+
+None. Every fixture entry maps to a production tool class. The drifts above are field-level / shape-level mismatches, not phantom tools.
+
+## Implications for Task #11 (schema swap)
+
+The audit confirms the team-lead's framing: the harness fixture has substantial drift across 8 of 11 entries plus 6+ missing production tools, but Task #11's plan is to swap the entire `NEXUS_TOOLS` array for the two-tool surface (`getTools` + `useTools`). After the swap:
+
+- The harness LLM only sees the two-tool MCP shape.
+- The executor parses the `useTools.tool` CLI string via the real `ToolCliNormalizer`.
+- Drifts above stop mattering for the LLM-facing surface.
+- They still matter for the executor: when it parses `content replace --path foo.md --start "..." --end "..." --content "..."` it must route to the production 4-field schema, not the obsolete 3-field one. This audit is the reference for getting that routing right.
+
+## Evidence Index
+
+| Tool | Production file:line |
+|------|----------------------|
+| read | `src/agents/contentManager/tools/read.ts:110-131` |
+| write | `src/agents/contentManager/tools/write.ts:174-196` |
+| insert | `src/agents/contentManager/tools/insert.ts:128-149` |
+| replace | `src/agents/contentManager/tools/replace.ts:202-227` |
+| setProperty | `src/agents/contentManager/tools/setProperty.ts:161-193` |
+| list | `src/agents/storageManager/tools/list.ts:167-185` |
+| move | `src/agents/storageManager/tools/move.ts:93-117` |
+| copy | `src/agents/storageManager/tools/copy.ts:84-107` |
+| archive | `src/agents/storageManager/tools/archive.ts:110-125` |
+| createFolder | `src/agents/storageManager/tools/createFolder.ts:68-83` |
+| open | `src/agents/storageManager/tools/open.ts` |
+| searchContent | `src/agents/searchManager/tools/searchContent.ts:472-517` |
+| searchDirectory | `src/agents/searchManager/tools/searchDirectory.ts:208-290` |
+| searchMemory | `src/agents/searchManager/tools/searchMemory.ts` |
+
+Fixture under audit: `tests/eval/fixtures/tools.ts` lines 16-184 (`NEXUS_TOOLS`). The file also contains `META_TOOLS` (`getTools`/`useTools`, lines 189-234) and `SIMPLE_TOOLS` (weather/time mocks, lines 239-268), neither of which are in scope for this audit.

From 0290631ad3615e11e132b1af9597a4b571bcf3a9 Mon Sep 17 00:00:00 2001
From: ProfSynapse <jrosenbaum9689@gmail.com>
Date: Tue, 19 May 2026 17:33:59 -0400
Subject: [PATCH 2/3] test(eval-harness): convert 5 nexus-mode scenarios to
 two-tool meta architecture
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The eval harness exposed direct domain tool schemas (NEXUS_TOOLS) to the LLM
even though production only ships getTools/useTools. The contradiction inflated
failure rates for models that picked the meta tools (correct production
behavior) while the harness asserted on direct domain calls.

Convert all toolSet: nexus scenarios — adversarial, basic-tool-call, multi-turn,
provider-parity, system-prompt — to toolSet: meta. Each direct domain call
becomes a getTools selector turn + useTools CLI-command turn. Inner domain
mock responses kept alongside getTools/useTools responses so both
LiveToolExecutor (mirror-parse capture) and EvalToolExecutor (inner-unwrap
capture) match.

After: 0 scenarios use toolSet: nexus; all 12 scenario files use toolSet: meta
exclusively. Harness now matches production two-tool MCP contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 tests/eval/scenarios/adversarial.eval.yaml    | 127 ++++++++++--
 .../eval/scenarios/basic-tool-call.eval.yaml  | 132 +++++++++++--
 tests/eval/scenarios/multi-turn.eval.yaml     | 182 ++++++++++++++++--
 .../eval/scenarios/provider-parity.eval.yaml  | 118 +++++++++++-
 tests/eval/scenarios/system-prompt.eval.yaml  | 122 ++++++++++--
 5 files changed, 627 insertions(+), 54 deletions(-)

diff --git a/tests/eval/scenarios/adversarial.eval.yaml b/tests/eval/scenarios/adversarial.eval.yaml
index e0cc3a015..0fec4250f 100644
--- a/tests/eval/scenarios/adversarial.eval.yaml
+++ b/tests/eval/scenarios/adversarial.eval.yaml
@@ -1,12 +1,43 @@
 - name: ambiguous-prompt
-  description: Ambiguous user request — model should search or ask for clarification
-  toolSet: nexus
+  description: Ambiguous user request — model should search or ask for clarification (via two-tool meta)
+  toolSet: meta
   turns:
     - userMessage: "Do something with my notes about the project"
       expectedTools:
-        - name: searchManager_searchContent
+        - name: getTools
+          optional: true
+        - name: useTools
           optional: true
       mockResponses:
+        getTools:
+          success: true
+          result:
+            tools:
+              - agent: searchManager
+                tool: searchContent
+                description: Search for notes by content keyword
+                command: "search content"
+                usage: "search content <query>"
+                arguments:
+                  - name: query
+                    flag: --query
+                    type: string
+                    required: true
+                    positional: true
+                examples:
+                  - 'search content "query-value"'
+        useTools:
+          success: true
+          result:
+            results:
+              - tool: searchManager_searchContent
+                success: true
+                result:
+                  results:
+                    - path: notes/project.md
+                      score: 0.88
+                      snippet: "Project overview and status..."
+                  totalResults: 1
         searchManager_searchContent:
           success: true
           result:
@@ -17,22 +48,48 @@
             totalResults: 1
 
 - name: tool-returns-error
-  description: Tool call returns an error — model should handle gracefully
-  toolSet: nexus
+  description: Tool call returns an error — model should handle gracefully (via two-tool meta)
+  toolSet: meta
   turns:
     - userMessage: "Read the file at notes/secret.md"
       expectedTools:
-        - name: contentManager_read
+        - name: getTools
+          params:
+            tool: "content"
+      mockResponses:
+        getTools:
+          success: true
+          result:
+            tools:
+              - agent: contentManager
+                tool: read
+                description: Read content from a file
+                command: "content read"
+                usage: "content read <path> <startLine>"
+                arguments:
+                  - name: path
+                    flag: --path
+                    type: string
+                    required: true
+                    positional: true
+                examples:
+                  - 'content read "path-value" 1'
+
+    - expectedTools:
+        - name: useTools
           params:
-            path: notes/secret.md
+            tool: "content read"
       mockResponses:
+        useTools:
+          success: false
+          error: "Permission denied: cannot access protected file"
         contentManager_read:
           success: false
           error: "Permission denied: cannot access protected file"
 
 - name: model-refuses-tools
   description: Model responds with text instead of calling tools — acceptable fallback
-  toolSet: nexus
+  toolSet: meta
   temperature: 0.3
   turns:
     - userMessage: "What do you think about the weather today?"
@@ -40,13 +97,61 @@
       mockResponses: {}
 
 - name: large-tool-response
-  description: Tool returns a large payload — continuation should not break
-  toolSet: nexus
+  description: Tool returns a large payload — continuation should not break (via two-tool meta)
+  toolSet: meta
   turns:
     - userMessage: "List all files in the vault root"
       expectedTools:
-        - name: storageManager_list
+        - name: getTools
+          params:
+            tool: "storage"
       mockResponses:
+        getTools:
+          success: true
+          result:
+            tools:
+              - agent: storageManager
+                tool: list
+                description: List files and folders in a directory
+                command: "storage list"
+                usage: "storage list [--path <path>]"
+                arguments:
+                  - name: path
+                    flag: --path
+                    type: string
+                    required: false
+                    positional: false
+                examples:
+                  - 'storage list'
+
+    - expectedTools:
+        - name: useTools
+          params:
+            tool: "storage list"
+      mockResponses:
+        useTools:
+          success: true
+          result:
+            results:
+              - tool: storageManager_list
+                success: true
+                result:
+                  path: /
+                  files:
+                    - {name: "note1.md", type: "file"}
+                    - {name: "note2.md", type: "file"}
+                    - {name: "note3.md", type: "file"}
+                    - {name: "note4.md", type: "file"}
+                    - {name: "note5.md", type: "file"}
+                    - {name: "note6.md", type: "file"}
+                    - {name: "note7.md", type: "file"}
+                    - {name: "note8.md", type: "file"}
+                    - {name: "note9.md", type: "file"}
+                    - {name: "note10.md", type: "file"}
+                  folders:
+                    - {name: "archive", type: "folder"}
+                    - {name: "notes", type: "folder"}
+                    - {name: "projects", type: "folder"}
         storageManager_list:
           success: true
           result:
diff --git a/tests/eval/scenarios/basic-tool-call.eval.yaml b/tests/eval/scenarios/basic-tool-call.eval.yaml
index f12cad3ad..2acf9399c 100644
--- a/tests/eval/scenarios/basic-tool-call.eval.yaml
+++ b/tests/eval/scenarios/basic-tool-call.eval.yaml
@@ -1,13 +1,48 @@
 - name: read-single-note
-  description: Basic single tool call — read a note by path
-  toolSet: nexus
+  description: Basic single tool call via two-tool meta — read a note by path
+  toolSet: meta
   turns:
     - userMessage: "Read the file at notes/meeting.md"
       expectedTools:
-        - name: contentManager_read
+        - name: getTools
           params:
-            path: notes/meeting.md
+            tool: "content"
       mockResponses:
+        getTools:
+          success: true
+          result:
+            tools:
+              - agent: contentManager
+                tool: read
+                description: Read content from a file
+                command: "content read"
+                usage: "content read <path> <startLine>"
+                arguments:
+                  - name: path
+                    flag: --path
+                    type: string
+                    required: true
+                    positional: true
+                examples:
+                  - 'content read "path-value" 1'
+
+    - expectedTools:
+        - name: useTools
+          params:
+            tool: "content read"
+      mockResponses:
+        useTools:
+          success: true
+          result:
+            results:
+              - tool: contentManager_read
+                success: true
+                result:
+                  content: "# Q2 Meeting\n- Roadmap reviewed\n- Budget approved\n- Launch date: June 15"
+                  path: notes/meeting.md
+                  totalLines: 4
+                  startLine: 1
+                  endLine: 4
         contentManager_read:
           success: true
           result:
@@ -18,15 +53,52 @@
             endLine: 4
 
 - name: write-new-note
-  description: Basic single tool call — write a new note
-  toolSet: nexus
+  description: Basic single tool call via two-tool meta — write a new note
+  toolSet: meta
   turns:
     - userMessage: "Create a note at notes/todo.md with the content 'Buy groceries'"
       expectedTools:
-        - name: contentManager_write
+        - name: getTools
           params:
-            path: notes/todo.md
+            tool: "content"
       mockResponses:
+        getTools:
+          success: true
+          result:
+            tools:
+              - agent: contentManager
+                tool: write
+                description: Write content to a file
+                command: "content write"
+                usage: "content write <path> <content>"
+                arguments:
+                  - name: path
+                    flag: --path
+                    type: string
+                    required: true
+                    positional: true
+                  - name: content
+                    flag: --content
+                    type: string
+                    required: true
+                    positional: true
+                examples:
+                  - 'content write "path-value" "content-value"'
+
+    - expectedTools:
+        - name: useTools
+          params:
+            tool: "content write"
+      mockResponses:
+        useTools:
+          success: true
+          result:
+            results:
+              - tool: contentManager_write
+                success: true
+                result:
+                  path: notes/todo.md
+                  created: true
         contentManager_write:
           success: true
           result:
@@ -34,15 +106,51 @@
             created: true
 
 - name: search-notes
-  description: Basic single tool call — search for notes
-  toolSet: nexus
+  description: Basic single tool call via two-tool meta — search for notes
+  toolSet: meta
   turns:
     - userMessage: "Find notes about project roadmap"
       expectedTools:
-        - name: searchManager_searchContent
+        - name: getTools
           params:
-            query: project roadmap
+            tool: "search"
       mockResponses:
+        getTools:
+          success: true
+          result:
+            tools:
+              - agent: searchManager
+                tool: searchContent
+                description: Search for notes by content keyword
+                command: "search content"
+                usage: "search content <query>"
+                arguments:
+                  - name: query
+                    flag: --query
+                    type: string
+                    required: true
+                    positional: true
+                examples:
+                  - 'search content "query-value"'
+
+    - expectedTools:
+        - name: useTools
+          params:
+            tool: "search content"
+      mockResponses:
+        useTools:
+          success: true
+          result:
+            results:
+              - tool: searchManager_searchContent
+                success: true
+                result:
+                  results:
+                    - path: notes/roadmap-q2.md
+                      score: 0.95
+                      snippet: "Q2 roadmap priorities..."
+                  totalResults: 1
+                  query: project roadmap
         searchManager_searchContent:
           success: true
           result:
diff --git a/tests/eval/scenarios/multi-turn.eval.yaml b/tests/eval/scenarios/multi-turn.eval.yaml
index a123df3e4..38c92ad5a 100644
--- a/tests/eval/scenarios/multi-turn.eval.yaml
+++ b/tests/eval/scenarios/multi-turn.eval.yaml
@@ -1,49 +1,174 @@
 - name: read-then-write-then-move
-  description: Read a note, write a summary, move it to archive
-  toolSet: nexus
+  description: Read a note, write a summary, move it to archive — via two-tool meta
+  toolSet: meta
   allowReorder: true
   turns:
     - userMessage: "Read notes/meeting.md, write a summary to notes/summary.md, then move it to archive/"
       expectedTools:
-        - name: contentManager_read
+        - name: getTools
           params:
-            path: notes/meeting.md
+            tool: "content"
       mockResponses:
+        getTools:
+          success: true
+          result:
+            tools:
+              - agent: contentManager
+                tool: read
+                description: Read content from a file
+                command: "content read"
+                usage: "content read <path> <startLine>"
+                arguments:
+                  - name: path
+                    flag: --path
+                    type: string
+                    required: true
+                    positional: true
+                examples:
+                  - 'content read "path-value" 1'
+              - agent: contentManager
+                tool: write
+                description: Write content to a file
+                command: "content write"
+                usage: "content write <path> <content>"
+                arguments:
+                  - name: path
+                    flag: --path
+                    type: string
+                    required: true
+                    positional: true
+                  - name: content
+                    flag: --content
+                    type: string
+                    required: true
+                    positional: true
+                examples:
+                  - 'content write "path-value" "content-value"'
+
+    - expectedTools:
+        - name: useTools
+          params:
+            tool: "content read"
+      mockResponses:
+        useTools:
+          success: true
+          result:
+            results:
+              - tool: contentManager_read
+                success: true
+                result:
+                  content: "# Q2 Meeting\n- Roadmap reviewed\n- Budget approved\n- Launch date: June 15"
         contentManager_read:
           success: true
           result:
             content: "# Q2 Meeting\n- Roadmap reviewed\n- Budget approved\n- Launch date: June 15"
 
     - expectedTools:
-        - name: contentManager_write
+        - name: useTools
           params:
-            path: notes/summary.md
+            tool: "content write"
       mockResponses:
+        useTools:
+          success: true
+          result:
+            results:
+              - tool: contentManager_write
+                success: true
+                result:
+                  path: notes/summary.md
         contentManager_write:
           success: true
           result:
             path: notes/summary.md
 
     - expectedTools:
-        - name: storageManager_move
+        - name: getTools
           params:
-            path: notes/summary.md
-            destination: archive/
+            tool: "storage"
+          optional: true
+        - name: useTools
+          params:
+            tool: "storage move"
       mockResponses:
+        getTools:
+          success: true
+          result:
+            tools:
+              - agent: storageManager
+                tool: move
+                description: Move a file to a new location
+                command: "storage move"
+                usage: "storage move <path> <destination>"
+                arguments:
+                  - name: path
+                    flag: --path
+                    type: string
+                    required: true
+                    positional: true
+                  - name: destination
+                    flag: --destination
+                    type: string
+                    required: true
+                    positional: true
+                examples:
+                  - 'storage move "path-value" "destination-value"'
+        useTools:
+          success: true
+          result:
+            results:
+              - tool: storageManager_move
+                success: true
+                result:
+                  newPath: archive/summary.md
         storageManager_move:
           success: true
           result:
             newPath: archive/summary.md
 
 - name: search-and-read
-  description: Search for notes about a topic, then read the top result
-  toolSet: nexus
+  description: Search for notes about a topic, then read the top result — via two-tool meta
+  toolSet: meta
   allowReorder: true
   turns:
     - userMessage: "Find notes about project roadmap and show me the full content of the best match"
       expectedTools:
-        - name: searchManager_searchContent
+        - name: getTools
+          params:
+            tool: "search"
+      mockResponses:
+        getTools:
+          success: true
+          result:
+            tools:
+              - agent: searchManager
+                tool: searchContent
+                description: Search for notes by content keyword
+                command: "search content"
+                usage: "search content <query>"
+                arguments:
+                  - name: query
+                    flag: --query
+                    type: string
+                    required: true
+                    positional: true
+                examples:
+                  - 'search content "query-value"'
+
+    - expectedTools:
+        - name: useTools
+          params:
+            tool: "search content"
       mockResponses:
+        useTools:
+          success: true
+          result:
+            results:
+              - tool: searchManager_searchContent
+                success: true
+                result:
+                  results:
+                    - path: notes/roadmap-q2.md
+                      score: 0.95
         searchManager_searchContent:
           success: true
           result:
@@ -52,10 +177,39 @@
                 score: 0.95
 
     - expectedTools:
-        - name: contentManager_read
+        - name: getTools
           params:
-            path: notes/roadmap-q2.md
+            tool: "content"
+          optional: true
+        - name: useTools
+          params:
+            tool: "content read"
       mockResponses:
+        getTools:
+          success: true
+          result:
+            tools:
+              - agent: contentManager
+                tool: read
+                description: Read content from a file
+                command: "content read"
+                usage: "content read <path> <startLine>"
+                arguments:
+                  - name: path
+                    flag: --path
+                    type: string
+                    required: true
+                    positional: true
+                examples:
+                  - 'content read "path-value" 1'
+        useTools:
+          success: true
+          result:
+            results:
+              - tool: contentManager_read
+                success: true
+                result:
+                  content: "# Q2 Roadmap\n1. Mobile launch\n2. Plugin store\n3. Performance"
         contentManager_read:
           success: true
           result:
diff --git a/tests/eval/scenarios/provider-parity.eval.yaml b/tests/eval/scenarios/provider-parity.eval.yaml
index fc48444a4..d63d20695 100644
--- a/tests/eval/scenarios/provider-parity.eval.yaml
+++ b/tests/eval/scenarios/provider-parity.eval.yaml
@@ -1,13 +1,50 @@
 - name: parity-single-tool-call
   description: Same basic tool call scenario — should pass across all configured providers
-  toolSet: nexus
+  toolSet: meta
   turns:
     - userMessage: "Read the file notes/meeting.md"
       expectedTools:
-        - name: contentManager_read
+        - name: getTools
           params:
-            path: notes/meeting.md
+            tool: "content"
+      mockResponses:
+        getTools:
+          success: true
+          result:
+            tools:
+              - agent: contentManager
+                tool: read
+                description: Read content from a file
+                command: "content read"
+                usage: "content read <path> <startLine>"
+                arguments:
+                  - name: path
+                    flag: --path
+                    type: string
+                    required: true
+                    positional: true
+                  - name: startLine
+                    flag: --start-line
+                    type: number
+                    required: true
+                    positional: true
+                examples:
+                  - 'content read "path-value" 1'
+
+    - expectedTools:
+        - name: useTools
+          params:
+            tool: "content read"
       mockResponses:
+        useTools:
+          success: true
+          result:
+            results:
+              - tool: contentManager_read
+                success: true
+                result:
+                  content: "# Meeting Notes\n- Discussed roadmap\n- Budget approved"
+                  path: notes/meeting.md
         contentManager_read:
           success: true
           result:
@@ -16,13 +53,50 @@
 
 - name: parity-search-then-read
   description: Two-turn flow — search then read — should work across all providers
-  toolSet: nexus
+  toolSet: meta
   allowReorder: true
   turns:
     - userMessage: "Search for notes about meetings and read the first result"
       expectedTools:
-        - name: searchManager_searchContent
+        - name: getTools
+          params:
+            tool: "search"
       mockResponses:
+        getTools:
+          success: true
+          result:
+            tools:
+              - agent: searchManager
+                tool: searchContent
+                description: Search for notes by content keyword
+                command: "search content"
+                usage: "search content <query>"
+                arguments:
+                  - name: query
+                    flag: --query
+                    type: string
+                    required: true
+                    positional: true
+                examples:
+                  - 'search content "query-value"'
+
+    - expectedTools:
+        - name: useTools
+          params:
+            tool: "search content"
+      mockResponses:
+        useTools:
+          success: true
+          result:
+            results:
+              - tool: searchManager_searchContent
+                success: true
+                result:
+                  results:
+                    - path: notes/meeting.md
+                      score: 0.92
+                      snippet: "Meeting notes..."
+                  totalResults: 1
         searchManager_searchContent:
           success: true
           result:
@@ -33,8 +107,40 @@
             totalResults: 1
 
     - expectedTools:
-        - name: contentManager_read
+        - name: getTools
+          params:
+            tool: "content"
+          optional: true
+        - name: useTools
+          params:
+            tool: "content read"
       mockResponses:
+        getTools:
+          success: true
+          result:
+            tools:
+              - agent: contentManager
+                tool: read
+                description: Read content from a file
+                command: "content read"
+                usage: "content read <path> <startLine>"
+                arguments:
+                  - name: path
+                    flag: --path
+                    type: string
+                    required: true
+                    positional: true
+                examples:
+                  - 'content read "path-value" 1'
+        useTools:
+          success: true
+          result:
+            results:
+              - tool: contentManager_read
+                success: true
+                result:
+                  content: "# Meeting\n- Items discussed"
+                  path: notes/meeting.md
         contentManager_read:
           success: true
           result:
diff --git a/tests/eval/scenarios/system-prompt.eval.yaml b/tests/eval/scenarios/system-prompt.eval.yaml
index d0893e903..6c25816af 100644
--- a/tests/eval/scenarios/system-prompt.eval.yaml
+++ b/tests/eval/scenarios/system-prompt.eval.yaml
@@ -1,13 +1,45 @@
 - name: default-prompt-routes-correctly
-  description: Production prompt routes a read request to contentManager_read
-  toolSet: nexus
+  description: Production prompt routes a read request via the two-tool meta architecture
+  toolSet: meta
   turns:
     - userMessage: "Read the file notes/meeting.md"
       expectedTools:
-        - name: contentManager_read
+        - name: getTools
           params:
-            path: notes/meeting.md
+            tool: "content"
       mockResponses:
+        getTools:
+          success: true
+          result:
+            tools:
+              - agent: contentManager
+                tool: read
+                description: Read content from a file
+                command: "content read"
+                usage: "content read <path> <startLine>"
+                arguments:
+                  - name: path
+                    flag: --path
+                    type: string
+                    required: true
+                    positional: true
+                examples:
+                  - 'content read "path-value" 1'
+
+    - expectedTools:
+        - name: useTools
+          params:
+            tool: "content read"
+      mockResponses:
+        useTools:
+          success: true
+          result:
+            results:
+              - tool: contentManager_read
+                success: true
+                result:
+                  content: "# Meeting Notes\n- Reviewed roadmap"
+                  path: notes/meeting.md
         contentManager_read:
           success: true
           result:
@@ -15,15 +47,47 @@
             path: notes/meeting.md
 
 - name: minimal-prompt-still-uses-tools
-  description: Production prompt triggers tool usage for a direct request
-  toolSet: nexus
+  description: Production prompt triggers two-tool flow for a direct read request
+  toolSet: meta
   turns:
     - userMessage: "Show me what's in notes/daily.md"
       expectedTools:
-        - name: contentManager_read
+        - name: getTools
           params:
-            path: notes/daily.md
+            tool: "content"
       mockResponses:
+        getTools:
+          success: true
+          result:
+            tools:
+              - agent: contentManager
+                tool: read
+                description: Read content from a file
+                command: "content read"
+                usage: "content read <path> <startLine>"
+                arguments:
+                  - name: path
+                    flag: --path
+                    type: string
+                    required: true
+                    positional: true
+                examples:
+                  - 'content read "path-value" 1'
+
+    - expectedTools:
+        - name: useTools
+          params:
+            tool: "content read"
+      mockResponses:
+        useTools:
+          success: true
+          result:
+            results:
+              - tool: contentManager_read
+                success: true
+                result:
+                  content: "# Daily Notes\n- Standup at 10am"
+                  path: notes/daily.md
         contentManager_read:
           success: true
           result:
@@ -31,13 +95,49 @@
             path: notes/daily.md
 
 - name: restrictive-prompt-limits-tools
-  description: Production prompt routes search request to searchManager
-  toolSet: nexus
+  description: Production prompt routes search request via the two-tool meta architecture
+  toolSet: meta
   turns:
     - userMessage: "Search for any notes about the quarterly review"
       expectedTools:
-        - name: searchManager_searchContent
+        - name: getTools
+          params:
+            tool: "search"
       mockResponses:
+        getTools:
+          success: true
+          result:
+            tools:
+              - agent: searchManager
+                tool: searchContent
+                description: Search for notes by content keyword
+                command: "search content"
+                usage: "search content <query>"
+                arguments:
+                  - name: query
+                    flag: --query
+                    type: string
+                    required: true
+                    positional: true
+                examples:
+                  - 'search content "query-value"'
+
+    - expectedTools:
+        - name: useTools
+          params:
+            tool: "search content"
+      mockResponses:
+        useTools:
+          success: true
+          result:
+            results:
+              - tool: searchManager_searchContent
+                success: true
+                result:
+                  results:
+                    - path: notes/q2-review.md
+                      score: 0.91
+                  totalResults: 1
         searchManager_searchContent:
           success: true
           result:

From 3b294e7de25c78f9a5dfebf41ee461371bc705c3 Mon Sep 17 00:00:00 2001
From: ProfSynapse <jrosenbaum9689@gmail.com>
Date: Tue, 19 May 2026 20:30:20 -0400
Subject: [PATCH 3/3] test(eval): refresh eval harness CLI commands

---
 docs/research/eval-harness-schema-audit.md    |  6 +--
 tests/eval/fixtures/tools.ts                  | 40 +++++++++------
 tests/eval/scenarios/adversarial.eval.yaml    |  6 +--
 .../eval/scenarios/basic-tool-call.eval.yaml  |  6 +--
 .../scenarios/content-operations.eval.yaml    | 32 ++++++------
 .../eval/scenarios/debug-multi-turn.eval.yaml | 10 ++--
 tests/eval/scenarios/multi-turn.eval.yaml     | 14 +++---
 .../eval/scenarios/provider-parity.eval.yaml  |  6 +--
 .../scenarios/search-variations.eval.yaml     | 50 +++++++++----------
 .../scenarios/storage-operations.eval.yaml    | 16 +++---
 tests/eval/scenarios/system-prompt.eval.yaml  |  6 +--
 tests/eval/scenarios/tool-discovery.eval.yaml | 40 +++++++--------
 tests/eval/scenarios/vague-prompts.eval.yaml  | 34 ++++++-------
 13 files changed, 139 insertions(+), 127 deletions(-)

diff --git a/docs/research/eval-harness-schema-audit.md b/docs/research/eval-harness-schema-audit.md
index 2a0ab0e57..8567a16d7 100644
--- a/docs/research/eval-harness-schema-audit.md
+++ b/docs/research/eval-harness-schema-audit.md
@@ -175,7 +175,7 @@ The fixture covers `contentManager` (4 of 5 tools), `storageManager` (5 of 6 too
 | `ingestManager_*` (full agent) | IngestManager | ingest, listCapabilities |
 | App agents (webTools, composer) | apps/ | openWebpage, capturePagePdf, capturePagePng, captureToMarkdown, extractLinks, compose, listFormats |
 
-The Task #8 eval run showed the LLM calling `searchManager_memory`, `canvasManager_list`, etc. — names that exist in production but were rejected as hallucinations by the harness because they are not in the fixture. After the Task #11 schema swap, this concern goes away: the LLM will only see `getTools`/`useTools` and discover available tools dynamically.
+The Task #8 eval run showed the LLM calling `searchManager_memory`, `canvasManager_list`, etc. — names that exist in production but were rejected as hallucinations by the harness because they are not in the fixture. The Task #11 schema swap removes those names from the callable tool schema, but the production system prompt can still mention agent/tool catalog entries, so live meta evals must still treat direct `agent_tool` calls as prompt-leak or model-behavior failures rather than assuming they cannot happen.
 
 ## Fixture-side tools NOT in production
 
@@ -185,9 +185,9 @@ None. Every fixture entry maps to a production tool class. The drifts above are
 
 The audit confirms the team-lead's framing: the harness fixture has substantial drift across 8 of 11 entries plus 6+ missing production tools, but Task #11's plan is to swap the entire `NEXUS_TOOLS` array for the two-tool surface (`getTools` + `useTools`). After the swap:
 
-- The harness LLM only sees the two-tool MCP shape.
+- The callable tool schema exposes only the two-tool MCP shape (`getTools` and `useTools`).
 - The executor parses the `useTools.tool` CLI string via the real `ToolCliNormalizer`.
-- Drifts above stop mattering for the LLM-facing surface.
+- Drifts above stop mattering for the callable function surface, but stale CLI examples and prompt catalog text can still bias models toward invalid command names.
 - They still matter for the executor: when it parses `content replace --path foo.md --start "..." --end "..." --content "..."` it must route to the production 4-field schema, not the obsolete 3-field one. This audit is the reference for getting that routing right.
 
 ## Evidence Index
diff --git a/tests/eval/fixtures/tools.ts b/tests/eval/fixtures/tools.ts
index 2a1b6d8de..b68462ec4 100644
--- a/tests/eval/fixtures/tools.ts
+++ b/tests/eval/fixtures/tools.ts
@@ -55,10 +55,9 @@ export const NEXUS_TOOLS: Tool[] = [
         properties: {
           path: { type: 'string', description: 'Path to the file to update' },
           content: { type: 'string', description: 'Content to insert' },
-          position: { type: 'string', description: 'Insertion position' },
-          lineNumber: { type: 'number', description: 'Optional line number' },
+          startLine: { type: 'number', description: 'Where to insert content: 1 prepends, -1 appends, any other value inserts before that line' },
         },
-        required: ['path', 'content', 'position'],
+        required: ['path', 'content', 'startLine'],
       },
     },
   },
@@ -71,10 +70,11 @@ export const NEXUS_TOOLS: Tool[] = [
         type: 'object',
         properties: {
           path: { type: 'string', description: 'Path to the file to update' },
-          search: { type: 'string', description: 'Text to find' },
-          replace: { type: 'string', description: 'Replacement text' },
+          start: { type: 'string', description: 'Opening anchor line or lines copied verbatim from the file' },
+          end: { type: 'string', description: 'Closing anchor line or lines copied verbatim from the file' },
+          content: { type: 'string', description: 'Replacement text for the anchored range' },
         },
-        required: ['path', 'search', 'replace'],
+        required: ['path', 'start', 'end', 'content'],
       },
     },
   },
@@ -87,9 +87,10 @@ export const NEXUS_TOOLS: Tool[] = [
         type: 'object',
         properties: {
           path: { type: 'string', description: 'Current path of the file or folder' },
-          destination: { type: 'string', description: 'Destination path' },
+          newPath: { type: 'string', description: 'Destination path' },
+          overwrite: { type: 'boolean', description: 'Overwrite if destination exists' },
         },
-        required: ['path', 'destination'],
+        required: ['path', 'newPath'],
       },
     },
   },
@@ -102,9 +103,10 @@ export const NEXUS_TOOLS: Tool[] = [
         type: 'object',
         properties: {
           path: { type: 'string', description: 'Current path of the file or folder' },
-          destination: { type: 'string', description: 'Destination path' },
+          newPath: { type: 'string', description: 'Destination path' },
+          overwrite: { type: 'boolean', description: 'Overwrite if destination exists' },
         },
-        required: ['path', 'destination'],
+        required: ['path', 'newPath'],
       },
     },
   },
@@ -144,9 +146,10 @@ export const NEXUS_TOOLS: Tool[] = [
       parameters: {
         type: 'object',
         properties: {
-          path: { type: 'string', description: 'Path to the directory to list' },
+          path: { type: 'string', description: 'Path to the directory to list; omit for vault root' },
+          filter: { type: 'string', description: 'Optional glob-style filter' },
         },
-        required: ['path'],
+        required: [],
       },
     },
   },
@@ -160,6 +163,10 @@ export const NEXUS_TOOLS: Tool[] = [
         properties: {
           query: { type: 'string', description: 'Search query text' },
           limit: { type: 'number', description: 'Maximum number of results to return' },
+          semantic: { type: 'boolean', description: 'Use semantic vector search when available' },
+          includeContent: { type: 'boolean', description: 'Include matched content in results' },
+          snippetLength: { type: 'number', description: 'Maximum snippet length' },
+          paths: { type: 'array', items: { type: 'string' }, description: 'Folder paths or glob patterns to search within' },
         },
         required: ['query'],
       },
@@ -175,9 +182,14 @@ export const NEXUS_TOOLS: Tool[] = [
         properties: {
           query: { type: 'string', description: 'Directory search query text' },
           paths: { type: 'array', items: { type: 'string' }, description: 'Paths to search within' },
-          searchType: { type: 'string', description: 'Search type filter' },
+          searchType: { type: 'string', enum: ['files', 'folders', 'both'], description: 'Search type filter' },
+          fileTypes: { type: 'array', items: { type: 'string' }, description: 'File extensions to include' },
+          depth: { type: 'number', description: 'Maximum directory depth' },
+          pattern: { type: 'string', description: 'Optional regex pattern filter' },
+          limit: { type: 'number', description: 'Maximum number of results' },
+          includeContent: { type: 'boolean', description: 'Include content snippets in results' },
         },
-        required: ['query'],
+        required: ['query', 'paths'],
       },
     },
   },
diff --git a/tests/eval/scenarios/adversarial.eval.yaml b/tests/eval/scenarios/adversarial.eval.yaml
index 0fec4250f..285ad1929 100644
--- a/tests/eval/scenarios/adversarial.eval.yaml
+++ b/tests/eval/scenarios/adversarial.eval.yaml
@@ -14,7 +14,7 @@
           result:
             tools:
               - agent: searchManager
-                tool: searchContent
+                tool: content
                 description: Search for notes by content keyword
                 command: "search content"
                 usage: "search content <query>"
@@ -30,7 +30,7 @@
           success: true
           result:
             results:
-              - tool: searchManager_searchContent
+              - tool: searchManager_content
                 success: true
                 result:
                   results:
@@ -38,7 +38,7 @@
                       score: 0.88
                       snippet: "Project overview and status..."
                   totalResults: 1
-        searchManager_searchContent:
+        searchManager_content:
           success: true
           result:
             results:
diff --git a/tests/eval/scenarios/basic-tool-call.eval.yaml b/tests/eval/scenarios/basic-tool-call.eval.yaml
index 2acf9399c..74bfda02d 100644
--- a/tests/eval/scenarios/basic-tool-call.eval.yaml
+++ b/tests/eval/scenarios/basic-tool-call.eval.yaml
@@ -120,7 +120,7 @@
           result:
             tools:
               - agent: searchManager
-                tool: searchContent
+                tool: content
                 description: Search for notes by content keyword
                 command: "search content"
                 usage: "search content <query>"
@@ -142,7 +142,7 @@
           success: true
           result:
             results:
-              - tool: searchManager_searchContent
+              - tool: searchManager_content
                 success: true
                 result:
                   results:
@@ -151,7 +151,7 @@
                       snippet: "Q2 roadmap priorities..."
                   totalResults: 1
                   query: project roadmap
-        searchManager_searchContent:
+        searchManager_content:
           success: true
           result:
             results:
diff --git a/tests/eval/scenarios/content-operations.eval.yaml b/tests/eval/scenarios/content-operations.eval.yaml
index e02730d18..374b606c9 100644
--- a/tests/eval/scenarios/content-operations.eval.yaml
+++ b/tests/eval/scenarios/content-operations.eval.yaml
@@ -136,7 +136,7 @@
                 tool: insert
                 description: Insert content at a specific position
                 command: "content insert"
-                usage: "content insert <path> <content> <position> [--line-number <lineNumber>]"
+                usage: "content insert <path> <content> <startLine>"
                 arguments:
                   - name: path
                     flag: --path
@@ -148,18 +148,13 @@
                     type: string
                     required: true
                     positional: true
-                  - name: position
-                    flag: --position
-                    type: string
+                  - name: startLine
+                    flag: --start-line
+                    type: number
                     required: true
                     positional: true
-                  - name: lineNumber
-                    flag: --line-number
-                    type: number
-                    required: false
-                    positional: false
                 examples:
-                  - 'content insert "path-value" "content-value" "position-value"'
+                  - 'content insert "path-value" "content-value" 1'
 
     - expectedTools:
         - name: useTools
@@ -198,25 +193,30 @@
                 tool: replace
                 description: Replace or delete content in a file
                 command: "content replace"
-                usage: "content replace <path> <search> <replace>"
+                usage: "content replace <path> <start> <end> <content>"
                 arguments:
                   - name: path
                     flag: --path
                     type: string
                     required: true
                     positional: true
-                  - name: search
-                    flag: --search
+                  - name: start
+                    flag: --start
                     type: string
                     required: true
                     positional: true
-                  - name: replace
-                    flag: --replace
+                  - name: end
+                    flag: --end
+                    type: string
+                    required: true
+                    positional: true
+                  - name: content
+                    flag: --content
                     type: string
                     required: true
                     positional: true
                 examples:
-                  - 'content replace "path-value" "search-value" "replace-value"'
+                  - 'content replace "path-value" "start-value" "end-value" "content-value"'
 
     - expectedTools:
         - name: useTools
diff --git a/tests/eval/scenarios/debug-multi-turn.eval.yaml b/tests/eval/scenarios/debug-multi-turn.eval.yaml
index 14a1c2dc5..0456cf208 100644
--- a/tests/eval/scenarios/debug-multi-turn.eval.yaml
+++ b/tests/eval/scenarios/debug-multi-turn.eval.yaml
@@ -55,10 +55,10 @@
           result:
             tools:
               - agent: searchManager
-                tool: searchContent
+                tool: content
                 description: Search for notes containing specific content
-                command: "search search-content"
-                usage: "search search-content <query>"
+                command: "search content"
+                usage: "search content <query>"
                 arguments:
                   - name: query
                     flag: --query
@@ -66,7 +66,7 @@
                     required: true
                     positional: true
                 examples:
-                  - 'search search-content "query-value"'
+                  - 'search content "query-value"'
 
     - expectedTools:
         - name: useTools
@@ -76,7 +76,7 @@
           result:
             results:
               - agent: searchManager
-                tool: searchContent
+                tool: content
                 success: true
                 data:
                   results:
diff --git a/tests/eval/scenarios/multi-turn.eval.yaml b/tests/eval/scenarios/multi-turn.eval.yaml
index 38c92ad5a..8d266fc14 100644
--- a/tests/eval/scenarios/multi-turn.eval.yaml
+++ b/tests/eval/scenarios/multi-turn.eval.yaml
@@ -98,20 +98,20 @@
                 tool: move
                 description: Move a file to a new location
                 command: "storage move"
-                usage: "storage move <path> <destination>"
+                usage: "storage move <path> <newPath>"
                 arguments:
                   - name: path
                     flag: --path
                     type: string
                     required: true
                     positional: true
-                  - name: destination
-                    flag: --destination
+                  - name: newPath
+                    flag: --new-path
                     type: string
                     required: true
                     positional: true
                 examples:
-                  - 'storage move "path-value" "destination-value"'
+                  - 'storage move "path-value" "newPath-value"'
         useTools:
           success: true
           result:
@@ -141,7 +141,7 @@
           result:
             tools:
               - agent: searchManager
-                tool: searchContent
+                tool: content
                 description: Search for notes by content keyword
                 command: "search content"
                 usage: "search content <query>"
@@ -163,13 +163,13 @@
           success: true
           result:
             results:
-              - tool: searchManager_searchContent
+              - tool: searchManager_content
                 success: true
                 result:
                   results:
                     - path: notes/roadmap-q2.md
                       score: 0.95
-        searchManager_searchContent:
+        searchManager_content:
           success: true
           result:
             results:
diff --git a/tests/eval/scenarios/provider-parity.eval.yaml b/tests/eval/scenarios/provider-parity.eval.yaml
index d63d20695..bec9f794b 100644
--- a/tests/eval/scenarios/provider-parity.eval.yaml
+++ b/tests/eval/scenarios/provider-parity.eval.yaml
@@ -67,7 +67,7 @@
           result:
             tools:
               - agent: searchManager
-                tool: searchContent
+                tool: content
                 description: Search for notes by content keyword
                 command: "search content"
                 usage: "search content <query>"
@@ -89,7 +89,7 @@
           success: true
           result:
             results:
-              - tool: searchManager_searchContent
+              - tool: searchManager_content
                 success: true
                 result:
                   results:
@@ -97,7 +97,7 @@
                       score: 0.92
                       snippet: "Meeting notes..."
                   totalResults: 1
-        searchManager_searchContent:
+        searchManager_content:
           success: true
           result:
             results:
diff --git a/tests/eval/scenarios/search-variations.eval.yaml b/tests/eval/scenarios/search-variations.eval.yaml
index 20fb0df4b..d66641f5b 100644
--- a/tests/eval/scenarios/search-variations.eval.yaml
+++ b/tests/eval/scenarios/search-variations.eval.yaml
@@ -2,7 +2,7 @@
 # for different flavors of "find/search/list" user requests.
 
 - name: search-by-content-keyword
-  description: User asks to find notes about a topic — should use searchManager_searchContent
+  description: User asks to find notes about a topic — should use searchManager_content
   toolSet: meta
   turns:
     - userMessage: "Find all my notes about machine learning"
@@ -16,10 +16,10 @@
           result:
             tools:
               - agent: searchManager
-                tool: searchContent
+                tool: content
                 description: Search for notes containing specific content
-                command: "search search-content"
-                usage: "search search-content <query> [--limit <limit>]"
+                command: "search content"
+                usage: "search content <query> [--limit <limit>]"
                 arguments:
                   - name: query
                     flag: --query
@@ -32,18 +32,18 @@
                     required: false
                     positional: false
                 examples:
-                  - 'search search-content "query-value"'
+                  - 'search content "query-value"'
 
     - expectedTools:
         - name: useTools
           params:
-            tool: "search search-content"
+            tool: "search content"
       mockResponses:
         useTools:
           success: true
           result:
             results:
-              - tool: searchContent
+              - tool: content
                 success: true
                 data:
                   results:
@@ -70,7 +70,7 @@
                 tool: list
                 description: List files and folders in a directory
                 command: "storage list"
-                usage: "storage list <path>"
+                usage: "storage list [--path <path>]"
                 arguments:
                   - name: path
                     flag: --path
@@ -114,10 +114,10 @@
           result:
             tools:
               - agent: searchManager
-                tool: searchContent
+                tool: content
                 description: Search for notes containing specific content
-                command: "search search-content"
-                usage: "search search-content <query>"
+                command: "search content"
+                usage: "search content <query>"
                 arguments:
                   - name: query
                     flag: --query
@@ -125,7 +125,7 @@
                     required: true
                     positional: true
                 examples:
-                  - 'search search-content "query-value"'
+                  - 'search content "query-value"'
               - agent: contentManager
                 tool: read
                 description: Read content from a file
@@ -148,13 +148,13 @@
     - expectedTools:
         - name: useTools
           params:
-            tool: "search search-content"
+            tool: "search content"
       mockResponses:
         useTools:
           success: true
           result:
             results:
-              - tool: searchContent
+              - tool: content
                 success: true
                 data:
                   results:
@@ -176,7 +176,7 @@
                   content: "# Q2 Budget\n\nTotal: $1.2M\n- Engineering: $800K\n- Marketing: $400K"
 
 - name: directory-search-not-content-search
-  description: User asks to find a specific file by name — should use searchManager_searchDirectory not searchContent
+  description: User asks to find a specific file by name — should use searchManager_directory not searchContent
   toolSet: meta
   turns:
     - userMessage: "Where is the file called meeting-notes.md?"
@@ -190,10 +190,10 @@
           result:
             tools:
               - agent: searchManager
-                tool: searchDirectory
+                tool: directory
                 description: Search for files and folders by name pattern
-                command: "search search-directory"
-                usage: "search search-directory <query>"
+                command: "search directory"
+                usage: "search directory <query>"
                 arguments:
                   - name: query
                     flag: --query
@@ -201,12 +201,12 @@
                     required: true
                     positional: true
                 examples:
-                  - 'search search-directory "query-value"'
+                  - 'search directory "query-value"'
               - agent: searchManager
-                tool: searchContent
+                tool: content
                 description: Search for notes containing specific content
-                command: "search search-content"
-                usage: "search search-content <query>"
+                command: "search content"
+                usage: "search content <query>"
                 arguments:
                   - name: query
                     flag: --query
@@ -214,18 +214,18 @@
                     required: true
                     positional: true
                 examples:
-                  - 'search search-content "query-value"'
+                  - 'search content "query-value"'
 
     - expectedTools:
         - name: useTools
           params:
-            tool: "search search-directory"
+            tool: "search directory"
       mockResponses:
         useTools:
           success: true
           result:
             results:
-              - tool: searchDirectory
+              - tool: directory
                 success: true
                 data:
                   results:
diff --git a/tests/eval/scenarios/storage-operations.eval.yaml b/tests/eval/scenarios/storage-operations.eval.yaml
index ee6d370ed..586b179ac 100644
--- a/tests/eval/scenarios/storage-operations.eval.yaml
+++ b/tests/eval/scenarios/storage-operations.eval.yaml
@@ -18,20 +18,20 @@
                 tool: move
                 description: Move a file or folder
                 command: "storage move"
-                usage: "storage move <path> <destination>"
+                usage: "storage move <path> <newPath>"
                 arguments:
                   - name: path
                     flag: --path
                     type: string
                     required: true
                     positional: true
-                  - name: destination
-                    flag: --destination
+                  - name: newPath
+                    flag: --new-path
                     type: string
                     required: true
                     positional: true
                 examples:
-                  - 'storage move "path-value" "destination-value"'
+                  - 'storage move "path-value" "newPath-value"'
 
     - expectedTools:
         - name: useTools
@@ -103,20 +103,20 @@
                 tool: copy
                 description: Copy a file or folder
                 command: "storage copy"
-                usage: "storage copy <path> <destination>"
+                usage: "storage copy <path> <newPath>"
                 arguments:
                   - name: path
                     flag: --path
                     type: string
                     required: true
                     positional: true
-                  - name: destination
-                    flag: --destination
+                  - name: newPath
+                    flag: --new-path
                     type: string
                     required: true
                     positional: true
                 examples:
-                  - 'storage copy "path-value" "destination-value"'
+                  - 'storage copy "path-value" "newPath-value"'
 
     - expectedTools:
         - name: useTools
diff --git a/tests/eval/scenarios/system-prompt.eval.yaml b/tests/eval/scenarios/system-prompt.eval.yaml
index 6c25816af..2241def78 100644
--- a/tests/eval/scenarios/system-prompt.eval.yaml
+++ b/tests/eval/scenarios/system-prompt.eval.yaml
@@ -109,7 +109,7 @@
           result:
             tools:
               - agent: searchManager
-                tool: searchContent
+                tool: content
                 description: Search for notes by content keyword
                 command: "search content"
                 usage: "search content <query>"
@@ -131,14 +131,14 @@
           success: true
           result:
             results:
-              - tool: searchManager_searchContent
+              - tool: searchManager_content
                 success: true
                 result:
                   results:
                     - path: notes/q2-review.md
                       score: 0.91
                   totalResults: 1
-        searchManager_searchContent:
+        searchManager_content:
           success: true
           result:
             results:
diff --git a/tests/eval/scenarios/tool-discovery.eval.yaml b/tests/eval/scenarios/tool-discovery.eval.yaml
index be57fd864..df147df74 100644
--- a/tests/eval/scenarios/tool-discovery.eval.yaml
+++ b/tests/eval/scenarios/tool-discovery.eval.yaml
@@ -60,10 +60,10 @@
           result:
             tools:
               - agent: searchManager
-                tool: searchContent
+                tool: content
                 description: Search for notes containing specific content
-                command: "search search-content"
-                usage: "search search-content <query>"
+                command: "search content"
+                usage: "search content <query>"
                 arguments:
                   - name: query
                     flag: --query
@@ -71,18 +71,18 @@
                     required: true
                     positional: true
                 examples:
-                  - 'search search-content "query-value"'
+                  - 'search content "query-value"'
 
     - expectedTools:
         - name: useTools
           params:
-            tool: "search search-content"
+            tool: "search content"
       mockResponses:
         useTools:
           success: true
           result:
             results:
-              - tool: searchContent
+              - tool: content
                 success: true
                 data:
                   results:
@@ -153,20 +153,20 @@
                 tool: copy
                 description: Copy a file or folder
                 command: "storage copy"
-                usage: "storage copy <path> <destination>"
+                usage: "storage copy <path> <newPath>"
                 arguments:
                   - name: path
                     flag: --path
                     type: string
                     required: true
                     positional: true
-                  - name: destination
-                    flag: --destination
+                  - name: newPath
+                    flag: --new-path
                     type: string
                     required: true
                     positional: true
                 examples:
-                  - 'storage copy "path-value" "destination-value"'
+                  - 'storage copy "path-value" "newPath-value"'
 
     - expectedTools:
         - name: useTools
@@ -198,10 +198,10 @@
           result:
             tools:
               - agent: searchManager
-                tool: searchContent
+                tool: content
                 description: Search for notes
-                command: "search search-content"
-                usage: "search search-content <query>"
+                command: "search content"
+                usage: "search content <query>"
                 arguments:
                   - name: query
                     flag: --query
@@ -209,18 +209,18 @@
                     required: true
                     positional: true
                 examples:
-                  - 'search search-content "query-value"'
+                  - 'search content "query-value"'
 
     - expectedTools:
         - name: useTools
           params:
-            tool: "search search-content"
+            tool: "search content"
       mockResponses:
         useTools:
           success: true
           result:
             results:
-              - tool: searchContent
+              - tool: content
                 success: true
                 data:
                   results:
@@ -286,20 +286,20 @@
                 tool: move
                 description: Move a file or folder
                 command: "storage move"
-                usage: "storage move <path> <destination>"
+                usage: "storage move <path> <newPath>"
                 arguments:
                   - name: path
                     flag: --path
                     type: string
                     required: true
                     positional: true
-                  - name: destination
-                    flag: --destination
+                  - name: newPath
+                    flag: --new-path
                     type: string
                     required: true
                     positional: true
                 examples:
-                  - 'storage move "path-value" "destination-value"'
+                  - 'storage move "path-value" "newPath-value"'
 
     - expectedTools:
         - name: useTools
diff --git a/tests/eval/scenarios/vague-prompts.eval.yaml b/tests/eval/scenarios/vague-prompts.eval.yaml
index fb218b545..b49d78c3b 100644
--- a/tests/eval/scenarios/vague-prompts.eval.yaml
+++ b/tests/eval/scenarios/vague-prompts.eval.yaml
@@ -16,10 +16,10 @@
           result:
             tools:
               - agent: searchManager
-                tool: searchContent
+                tool: content
                 description: Search for notes containing specific content
-                command: "search search-content"
-                usage: "search search-content <query>"
+                command: "search content"
+                usage: "search content <query>"
                 arguments:
                   - name: query
                     flag: --query
@@ -27,18 +27,18 @@
                     required: true
                     positional: true
                 examples:
-                  - 'search search-content "query-value"'
+                  - 'search content "query-value"'
 
     - expectedTools:
         - name: useTools
           params:
-            tool: "search search-content"
+            tool: "search content"
       mockResponses:
         useTools:
           success: true
           result:
             results:
-              - tool: searchContent
+              - tool: content
                 success: true
                 data:
                   results:
@@ -109,7 +109,7 @@
                 tool: list
                 description: List files and folders
                 command: "storage list"
-                usage: "storage list <path>"
+                usage: "storage list [--path <path>]"
                 arguments:
                   - name: path
                     flag: --path
@@ -122,20 +122,20 @@
                 tool: move
                 description: Move a file or folder
                 command: "storage move"
-                usage: "storage move <path> <destination>"
+                usage: "storage move <path> <newPath>"
                 arguments:
                   - name: path
                     flag: --path
                     type: string
                     required: true
                     positional: true
-                  - name: destination
-                    flag: --destination
+                  - name: newPath
+                    flag: --new-path
                     type: string
                     required: true
                     positional: true
                 examples:
-                  - 'storage move "path-value" "destination-value"'
+                  - 'storage move "path-value" "newPath-value"'
 
     - expectedTools:
         - name: useTools
@@ -204,10 +204,10 @@
                 examples:
                   - 'content read "path-value" 1'
               - agent: searchManager
-                tool: searchContent
+                tool: content
                 description: Search for notes containing specific content
-                command: "search search-content"
-                usage: "search search-content <query>"
+                command: "search content"
+                usage: "search content <query>"
                 arguments:
                   - name: query
                     flag: --query
@@ -215,12 +215,12 @@
                     required: true
                     positional: true
                 examples:
-                  - 'search search-content "query-value"'
+                  - 'search content "query-value"'
 
     - expectedTools:
         - name: useTools
           params:
-            tool: "content read, search search-content"
+            tool: "content read, search content"
       mockResponses:
         useTools:
           success: true
@@ -230,7 +230,7 @@
                 success: true
                 data:
                   content: "# Todo\n- Fix login bug\n- Update docs"
-              - tool: searchContent
+              - tool: content
                 success: true
                 data:
                   results: