Skip to content

search/nav: high trajectory variance; occasional non-convergent runaways (3-5x tokens) on large repos #624

Description

@justrach

Summary

The index file codedb.snapshot is written into the indexed project's root directory, where it shows up in git status and can be accidentally committed. It is large - in one real project the root codedb.snapshot is 22.8 MB.

Evidence

  • A codedb.snapshot (22.8 MB) sits at the root of an actively-developed repo after normal codedb use.
  • In an automated harness, a plain git add -A && git diff swept codedb.snapshot into the captured patch (it began with diff --git a/codedb.snapshot ... Binary files differ), corrupting the diff. Workaround required adding it to .git/info/exclude per checkout.

Impact

  • Pollutes git status; easy to commit a multi-MB binary by accident.
  • Breaks any tooling that diffs/snapshots the working tree.

Suggested direction

  • Store the index outside the working tree (e.g. an XDG cache dir keyed by repo path), or
  • Write it to a .codedb/ directory and auto-append codedb.snapshot/.codedb/ to the repo's .gitignore (or .git/info/exclude) on first index.

Found via an independent SWE-bench Lite token-efficiency benchmark: identical agent (`claude -p`, Sonnet 4.6) and tasks, only the tool surface differs - native Read/Grep/Edit vs codedb MCP tools. N=51 paired instances; full harness + data available.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions