Skip to content

codedb: byte-exact raw read (#632) + first-class index command (#633)#634

Closed
justrach wants to merge 5 commits into
release/0.2.5826from
fix/codedb-632-633
Closed

codedb: byte-exact raw read (#632) + first-class index command (#633)#634
justrach wants to merge 5 commits into
release/0.2.5826from
fix/codedb-632-633

Conversation

@justrach

Copy link
Copy Markdown
Owner

Two codedb fixes surfaced while benchmarking codedb as an agent tool (claude -p / graff on SWE-bench), each filed with a failing zig test first (repo policy) then fixed in a separate commit. Full suite green; #632 also production-verified through the sandbox gateway on react (7,243 files).

#632codedb_read raw mode (byte-exact ranged read)

Ranged codedb_read emitted line-number-prefixed output (extractLines(line_numbers=true)) + a hash: header, so it was not a verbatim copy of the source — agents couldn't feed it to an exact-string editor and fell back to a native read (the codedb side of justrach/codegraff#66). Adds a raw arg: line_numbers=false, no hash header, no full-file hint. Default unchanged.

#633index is a first-class command

codedb <root> index triggered the scan/persist path (cold-load keys on cmd=="index") but wasn't in isCommand and had no dispatch branch, so it printed unknown command: index + exit 1 after doing the work; codedb index (no root) was a usage error. Registers index and adds a dispatch branch that reports index ready + exit 0.

Base: release/0.2.5826.

justrach and others added 4 commits June 21, 2026 17:12
…ber-prefixed, not byte-exact

codedb_read with a line range emits extractLines(..., line_numbers=true) output
plus a hash: header, so it is not a verbatim copy of the source. Agents that
located code via codedb then fall back to a native read for the exact pre-edit
span (see justrach/codegraff#66), so codedb never serves read+edit, only locate.

The test asks for a raw/byte-exact ranged read (raw:true) and fails on main.
Not fixing in this commit per repo policy.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a `raw` arg to codedb_read. In raw mode handleRead passes line_numbers=false
to extractLines and suppresses the hash: header and the full-file read hint, so
the body is a verbatim copy of the source — usable as the old_string for an
exact-match edit. Default (prefixed, hash header) is unchanged.

This unblocks routing in-repo reads through codedb (justrach/codegraff#66) so
codedb can serve read+edit, not just locate.

Closes #632.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`codedb index` (no root) is a usage error and `codedb <root> index` falls
through the dispatch to 'unknown command: index' (exit 1) even though the
cold-load path already scanned + persisted. The test asserts parsePositional
treats `index` as a first-class command; fails on main. No fix in this commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`index` triggers the scan/persist path (the cold-load step keys on cmd==index)
but was never registered in isCommand and had no dispatch branch, so it printed
'unknown command: index' + exit 1 after silently doing the work, and `codedb
index` with no root was a usage error. Register it in isCommand and add an
`index` dispatch branch that reports 'index ready' + exits 0.

Closes #633.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool Base (ns) Head (ns) Delta Abs Delta (ns) Status
codedb_bundle 70710 73009 +3.25% +2299 OK
codedb_changes 11914 10061 -15.55% -1853 OK
codedb_context 754876 736967 -2.37% -17909 OK
codedb_deps 326 320 -1.84% -6 OK
codedb_edit 43465 40570 -6.66% -2895 OK
codedb_find 3089 2893 -6.35% -196 OK
codedb_hot 25728 23964 -6.86% -1764 OK
codedb_outline 15473 15391 -0.53% -82 OK
codedb_read 13169 12036 -8.60% -1133 OK
codedb_search 61927 62278 +0.57% +351 OK
codedb_snapshot 69698 72076 +3.41% +2378 OK
codedb_status 12743 10682 -16.17% -2061 OK
codedb_symbol 55711 57981 +4.07% +2270 OK
codedb_tree 22743 22447 -1.30% -296 OK
codedb_word 12005 11170 -6.96% -835 OK

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

codedb/src/mcp.zig

Lines 2988 to 2992 in 2c130a1

if (explorer.renderCachedRead(path, alloc, out, .{
.if_hash = if_hash,
.line_start = line_start_raw,
.line_end = line_end_raw,
.compact = compact,

P2 Badge Honor raw reads on cached content

When the file is still in Explorer.contents (the normal path for small MCP projects and freshly indexed files), this early renderCachedRead return runs before the new raw rendering below; ReadRenderOptions has no raw field, and renderReadBytes always emits the hash: header and numbered ranges. As a result, codedb_read with raw=true still returns non-raw output for cached files, so the new feature only works after contents have been released or on disk fallback.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/mcp.zig
const end: u32 = if (line_end_raw) |n| @intCast(@min(@max(1, n), std.math.maxInt(u32))) else std.math.maxInt(u32);
const lang = explore_mod.detectLanguage(path);
const extracted = explore_mod.extractLines(content, start, end, true, compact, lang, alloc) catch {
const extracted = explore_mod.extractLines(content, start, end, !raw, compact, lang, alloc) catch {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve EOF without a final newline

For raw ranged reads that end at a file without a trailing newline, this still goes through extractLines, whose unnumbered path reprints every selected line with an added \n; for example, contents alpha\nbeta with line_start=1,line_end=2,raw=true returns alpha\nbeta\n. That extra byte makes the advertised byte-exact output unusable for exact-string edits at EOF, so raw mode should slice the original bytes for the requested range instead of formatting lines.

Useful? React with 👍 / 👎.

Add cases pinning the exact contract: raw full-file read is a byte-exact copy
(no hash header, no line-number prefix, no full-file hint); the default ranged
read still carries both the hash header and the 'N | ' prefix (regression
guard); a raw ranged read drops both.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d35804e6de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/mcp.zig
const compact = getBool(args, "compact");
// #632: byte-exact ranged read — no line-number prefixes, no hash header — so
// the output can feed an exact-string editor instead of forcing a native read.
const raw = getBool(args, "raw");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor raw mode on cached reads

When the requested file is already in Explorer.contents (the normal post-scan MCP path, since indexed files are stored in the content cache), this new raw flag is ignored because renderCachedRead returns before the raw-aware rendering below and ReadRenderOptions has no raw field. In that scenario codedb_read with raw:true still emits the hash: header and N | line prefixes, so the byte-exact mode only works on the disk fallback path; the added tests miss this because they never index the file before reading it.

Useful? React with 👍 / 👎.

@github-actions

Copy link
Copy Markdown

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool Base (ns) Head (ns) Delta Abs Delta (ns) Status
codedb_bundle 73765 76914 +4.27% +3149 OK
codedb_changes 11144 11509 +3.28% +365 OK
codedb_context 785754 760915 -3.16% -24839 OK
codedb_deps 412 335 -18.69% -77 OK
codedb_edit 41232 42616 +3.36% +1384 OK
codedb_find 2905 2815 -3.10% -90 OK
codedb_hot 25989 26122 +0.51% +133 OK
codedb_outline 17671 16394 -7.23% -1277 OK
codedb_read 14302 14204 -0.69% -98 OK
codedb_search 67512 67348 -0.24% -164 OK
codedb_snapshot 75107 74237 -1.16% -870 OK
codedb_status 9632 10471 +8.71% +839 OK
codedb_symbol 53730 55754 +3.77% +2024 OK
codedb_tree 23225 25923 +11.62% +2698 NOISE
codedb_word 12286 12126 -1.30% -160 OK

@justrach

Copy link
Copy Markdown
Owner Author

Landed on release/0.2.5826 via fast-forward (rebased onto the post-split tip; full suite green). Commits 375989e..c72a129. GitHub won't badge this 'merged' since it didn't go through the merge button, but the patches are verifiably on the release tip.

@justrach justrach closed this Jun 22, 2026
@justrach justrach deleted the fix/codedb-632-633 branch June 22, 2026 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant