codedb: byte-exact raw read (#632) + first-class index command (#633)#634
codedb: byte-exact raw read (#632) + first-class index command (#633)#634justrach wants to merge 5 commits into
index command (#633)#634Conversation
…ber-prefixed, not byte-exact codedb_read with a line range emits extractLines(..., line_numbers=true) output plus a hash: header, so it is not a verbatim copy of the source. Agents that located code via codedb then fall back to a native read for the exact pre-edit span (see justrach/codegraff#66), so codedb never serves read+edit, only locate. The test asks for a raw/byte-exact ranged read (raw:true) and fails on main. Not fixing in this commit per repo policy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a `raw` arg to codedb_read. In raw mode handleRead passes line_numbers=false to extractLines and suppresses the hash: header and the full-file read hint, so the body is a verbatim copy of the source — usable as the old_string for an exact-match edit. Default (prefixed, hash header) is unchanged. This unblocks routing in-repo reads through codedb (justrach/codegraff#66) so codedb can serve read+edit, not just locate. Closes #632. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`codedb index` (no root) is a usage error and `codedb <root> index` falls through the dispatch to 'unknown command: index' (exit 1) even though the cold-load path already scanned + persisted. The test asserts parsePositional treats `index` as a first-class command; fails on main. No fix in this commit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`index` triggers the scan/persist path (the cold-load step keys on cmd==index) but was never registered in isCommand and had no dispatch branch, so it printed 'unknown command: index' + exit 1 after silently doing the work, and `codedb index` with no root was a usage error. Register it in isCommand and add an `index` dispatch branch that reports 'index ready' + exits 0. Closes #633. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Benchmark Regression ReportThresholds: 10.00% and 50,000 ns absolute delta
|
There was a problem hiding this comment.
💡 Codex Review
Lines 2988 to 2992 in 2c130a1
When the file is still in Explorer.contents (the normal path for small MCP projects and freshly indexed files), this early renderCachedRead return runs before the new raw rendering below; ReadRenderOptions has no raw field, and renderReadBytes always emits the hash: header and numbered ranges. As a result, codedb_read with raw=true still returns non-raw output for cached files, so the new feature only works after contents have been released or on disk fallback.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const end: u32 = if (line_end_raw) |n| @intCast(@min(@max(1, n), std.math.maxInt(u32))) else std.math.maxInt(u32); | ||
| const lang = explore_mod.detectLanguage(path); | ||
| const extracted = explore_mod.extractLines(content, start, end, true, compact, lang, alloc) catch { | ||
| const extracted = explore_mod.extractLines(content, start, end, !raw, compact, lang, alloc) catch { |
There was a problem hiding this comment.
Preserve EOF without a final newline
For raw ranged reads that end at a file without a trailing newline, this still goes through extractLines, whose unnumbered path reprints every selected line with an added \n; for example, contents alpha\nbeta with line_start=1,line_end=2,raw=true returns alpha\nbeta\n. That extra byte makes the advertised byte-exact output unusable for exact-string edits at EOF, so raw mode should slice the original bytes for the requested range instead of formatting lines.
Useful? React with 👍 / 👎.
Add cases pinning the exact contract: raw full-file read is a byte-exact copy (no hash header, no line-number prefix, no full-file hint); the default ranged read still carries both the hash header and the 'N | ' prefix (regression guard); a raw ranged read drops both. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d35804e6de
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const compact = getBool(args, "compact"); | ||
| // #632: byte-exact ranged read — no line-number prefixes, no hash header — so | ||
| // the output can feed an exact-string editor instead of forcing a native read. | ||
| const raw = getBool(args, "raw"); |
There was a problem hiding this comment.
Honor raw mode on cached reads
When the requested file is already in Explorer.contents (the normal post-scan MCP path, since indexed files are stored in the content cache), this new raw flag is ignored because renderCachedRead returns before the raw-aware rendering below and ReadRenderOptions has no raw field. In that scenario codedb_read with raw:true still emits the hash: header and N | line prefixes, so the byte-exact mode only works on the disk fallback path; the added tests miss this because they never index the file before reading it.
Useful? React with 👍 / 👎.
Benchmark Regression ReportThresholds: 10.00% and 50,000 ns absolute delta
|
Two codedb fixes surfaced while benchmarking codedb as an agent tool (claude -p / graff on SWE-bench), each filed with a failing
zig testfirst (repo policy) then fixed in a separate commit. Full suite green; #632 also production-verified through the sandbox gateway on react (7,243 files).#632 —
codedb_readraw mode (byte-exact ranged read)Ranged
codedb_reademitted line-number-prefixed output (extractLines(line_numbers=true)) + ahash:header, so it was not a verbatim copy of the source — agents couldn't feed it to an exact-string editor and fell back to a native read (the codedb side of justrach/codegraff#66). Adds arawarg:line_numbers=false, no hash header, no full-file hint. Default unchanged.12c5555failing test →1009c64fix. Closes codedb_read: ranged output is line-number-prefixed (not byte-exact) — add a raw read mode #632.#633 —
indexis a first-class commandcodedb <root> indextriggered the scan/persist path (cold-load keys oncmd=="index") but wasn't inisCommandand had no dispatch branch, so it printedunknown command: index+ exit 1 after doing the work;codedb index(no root) was a usage error. Registersindexand adds a dispatch branch that reportsindex ready+ exit 0.2256739failing test →2c130a1fix. Closes codedbindexis a phantom command: scans + persists but exits 1 with 'unknown command: index' #633.Base:
release/0.2.5826.