fix: snapshot git-pollution (#625), structural-tool steering (#626), convergence governor (#624)#631
fix: snapshot git-pollution (#625), structural-tool steering (#626), convergence governor (#624)#631justrach wants to merge 4 commits into
Conversation
The index was written to {root}/codedb.snapshot and showed up in
`git status` (22.8 MB in one real repo), so it was easy to commit by
accident and it corrupted any tooling that diffs the working tree
(a plain `git add -A && git diff` swept the binary into the patch).
After writing the in-tree snapshot, append `codedb.snapshot` to the
repo's `.git/info/exclude` — a local, untracked ignore file — so git
never sees it, without touching the user's tracked `.gitignore`.
Best-effort and idempotent: not-a-git-repo, worktrees where `.git` is a
file, or any I/O error are silently skipped so indexing never fails.
isRootSnapshot guards so only the in-tree write triggers it, not the
central ~/.codedb store.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Agents on the codedb MCP surface default to search -> read -> edit and skip the structural tools (symbol/callers/deps/outline), so the code graph goes unexercised. Make the structural path the path of least resistance. - Reframe tool descriptions + server instructions to prescribe the structural tools first and cast codedb_search as a substring/phrase fallback. - Runtime nudge on search: a bare identifier that resolves to an indexed symbol prepends a one-line pointer to codedb_symbol/codedb_callers (text output only, skipped for format=json). - Runtime nudge on read: a whole-file read (>=400 lines, no range) prepends a pointer to codedb_outline; wired into both the cached and uncached paths. - Tests (issue-626) cover the gating logic: isBareIdentifier and fullFileReadHint. Closes #626. (#623 closed separately as a duplicate; its distinct loop/redundancy-detection guardrail is not addressed here.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#626) Follow-up to the #626 structural steering. Auditing the tool surface showed mcpGenerateGuidance already steers most graph tools as "-> next" hints (callers->callpath, edit->changes, hot->outline, the symbol/search/outline/word chain). The single genuine gap is codedb_deps: nothing points to it and it has no next-hint. - Add depsHint: after a single-definition codedb_symbol hit (the moment before an edit, when blast radius matters), prepend a one-line pointer to codedb_deps. Pure + count-gated (results.len == 1), text-only, mirrors fullFileReadHint. - Upgrade three passive differentiator descriptions to prescriptive: codedb_deps (impact/blast-radius), codedb_hot (orientation), codedb_changes (what-changed). No callpath nudge: codedb_callers already emits "-> next: codedb_callpath", so an inline one would duplicate it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Large-repo trajectories showed high variance with occasional non-convergent runaways (3–5× tokens) — an agent firing the same search/read over and over without progress. Add a per-session ConvergenceGovernor: an 8-deep ring buffer of recent navigation call signatures (tool name + argument values). When the same nav call (search/find/word/read/outline) recurs >= 3 times in the window, handleCall appends a one-line in-band nudge steering the agent to a structural tool (symbol/callers/deps), a direct read, or a refined query. The nudge is appended to the assistant-visible output only — it never changes a tool's result, and write/admin tools are not governed. Session-less callers pass a null governor (no-op). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
👋 Thanks for the contribution! Quick heads-up: this repo lands changes on the current Please retarget this PR via Edit → base branch to the active release branch (currently (Automated hint — reply here if you need a hand.) |
Benchmark Regression ReportThresholds: 10.00% and 50,000 ns absolute delta
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6e30cc2676
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (occurrences >= ConvergenceGovernor.WARN_AT) { | ||
| out.appendSlice(alloc, "\n\n[codedb] You have issued this exact call several times — repeating it will not surface anything new. Change strategy: use a structural tool (codedb_symbol for a definition, codedb_callers for usages, codedb_deps for impact), open the file directly with codedb_read, or refine the query.") catch {}; |
There was a problem hiding this comment.
Preserve JSON responses when adding loop hints
When a governed tool that supports structured output is repeated with format=json (for example the third identical codedb_search call), this appends a plain-text convergence hint after the handler has already written the JSON payload. The MCP response is still marked successful, but the assistant-visible text is no longer parseable JSON, defeating the advertised format=json contract; the governor should skip or separate hints for JSON-formatted tool calls.
Useful? React with 👍 / 👎.
| defer info_dir.close(io); | ||
|
|
||
| const needle = "codedb.snapshot"; | ||
| const existing: ?[]u8 = info_dir.readFileAlloc(io, "exclude", allocator, .limited(1024 * 1024)) catch null; |
There was a problem hiding this comment.
Do not overwrite unreadable exclude files
If .git/info/exclude already exists but cannot be read here (for example it exceeds the 1 MiB limit), existing becomes null and the later createFile path rewrites the file as if it were absent, dropping the user's existing local ignore rules. Since this helper is best-effort, it should only create a new exclude on FileNotFound and otherwise return without modifying the file.
Useful? React with 👍 / 👎.
Fixes the remaining cluster of issues filed in the last 2 days. Each carries a test; full
zig build testis green.#625 —
codedb.snapshotpollutes gitThe 22.8 MB in-tree index showed up in
git statusand corrupted working-tree diffs. After writing the in-tree snapshot,codedb.snapshotis appended to.git/info/exclude(local, untracked — leaves the user's.gitignorealone). Best-effort + idempotent; only the in-tree write triggers it, not the central~/.codedbstore.issue-625: in-tree snapshot is added to .git/info/exclude(test_snapshot.zig)#626 — agents skip the structural tools
Tool descriptions + the MCP
initializeinstructions now steer agents tosymbol/callers/deps/outlinefirst and framesearchas a fallback;codedb_depsis surfaced as the impact/blast-radius tool. (Cherry-picked from theissue-626-structural-steeringwork.)#624 — non-convergent nav runaways (3–5× tokens)
New per-session
ConvergenceGovernor: an 8-deep ring buffer of recent nav call signatures. When the samesearch/find/word/read/outlinecall recurs ≥3× in the window, an in-band nudge is appended steering the agent to a structural tool, a direct read, or a refined query. The nudge never alters a tool's result; write/admin tools aren't governed.issue-624: convergence governor flags a repeated identical call(test_mcp.zig)Closes #624
Closes #625
Closes #626
🤖 Generated with Claude Code