Skip to content

[codex] Tighten malformed index loader diagnostics#153

Merged
Navi Bot (project-navi-bot) merged 4 commits into
mainfrom
codex/fail-closed-loader-contract
Jun 3, 2026
Merged

[codex] Tighten malformed index loader diagnostics#153
Navi Bot (project-navi-bot) merged 4 commits into
mainfrom
codex/fail-closed-loader-contract

Conversation

@Fieldnote-Echo

Copy link
Copy Markdown
Member

Summary

  • adds contextual fixed-header reads for all persisted index formats so truncated magic/version/field reads identify the format and field
  • splits exact-length validation into distinct truncated-payload and trailing-byte errors for TVR1, TVRQ, TVBM, and TVSB
  • shares checked payload-size helpers between probe, load, and write paths while preserving the existing allocation and MAX_PAYLOAD guards
  • pins probe and public loader behavior with malformed payload/header tests, including the RankQuant::load and SignBitmap::load paths consumed by ordgrep

Closes #151.

Scope notes

This is intentionally limited to the primitive persisted-index loader/prober boundary. It does not add the persisted format compatibility document/fixtures from #118, backend determinism semantics from #127, manifest schema changes, or ordgrep store policy.

Downstream check

ordgrep-rs full-loads .ordgrep/index.ordvec with RankQuant::load and .ordgrep/sign.ordvec with SignBitmap::load, then adds only loading <path> context. That means the primitive loader error string is the operator-facing reason for malformed binary artifacts, so this PR keeps the error context in ordvec proper.

Validation

  • cargo fmt --check
  • cargo test --lib rank_io::tests
  • cargo test --test index loader
  • cargo test --test redteam_delta delta_a
  • cargo test -p ordvec --all-features
  • cargo test --workspace --exclude ordvec-python --all-features
  • cargo check -p ordvec-python --all-features
  • cargo clippy --workspace --all-targets --all-features -- -D warnings
  • cargo deny check
  • git diff --check

cargo test --workspace --all-features was also attempted and failed while linking the ordvec-python Rust test harness, with unresolved Python C API symbols such as PyObject_Str and Py_InitializeEx. The Python binding compile check listed above passed; this PR does not touch Python code.

Review

Adversarial review found no functional blockers. One stale SignBitmap loader doc comment was corrected before publishing.

@codecov

codecov Bot commented Jun 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 99.20000% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/rank_io.rs 99.20% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the index file metadata probing and loading logic in src/rank_io.rs to improve error reporting and reduce code duplication. It introduces helper functions for reading exact fields and calculating payload sizes, providing detailed context on truncated headers and payload mismatches. The test suite has been updated to verify these new error messages. Feedback on the changes highlights a potential integer overflow on 32-bit targets in rankquant_payload_bytes due to premature intermediate multiplication, and suggests refactoring the calculation using the pre-computed bytes per vector to avoid this issue.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/rank_io.rs
@qodo-code-review

Copy link
Copy Markdown

Review Summary by Qodo

Tighten malformed index loader diagnostics with field context

✨ Enhancement 🧪 Tests

Grey Divider

Walkthroughs

Description
• Refactors malformed index loader diagnostics with contextual field labels
  - Adds label parameter to all header read functions for precise error context
  - Splits exact-length validation into distinct truncated vs trailing-byte errors
• Extracts shared payload-size calculation helpers across all formats
  - rank_payload_bytes, rankquant_payload_bytes, bitmap_payload_bytes
  - Preserves existing MAX_PAYLOAD guards and allocation safety
• Enhances error messages with format and field identification
  - Truncated headers now report which field failed (magic, version, dim, etc.)
  - Payload mismatches distinguish between truncation and trailing bytes
• Adds comprehensive malformed payload tests for all index formats
  - Tests probe and public loader paths (RankQuant::load, SignBitmap::load)
  - Validates both truncated header and payload error contexts
Diagram
flowchart LR
  A["Header Read Functions"] -->|add label param| B["Contextual Error Messages"]
  C["Payload Validation"] -->|split logic| D["Truncated vs Trailing Errors"]
  E["Duplicate Calculations"] -->|extract helpers| F["Shared Size Functions"]
  B --> G["Enhanced Diagnostics"]
  D --> G
  F --> G
  G --> H["Malformed Payload Tests"]

Loading

Grey Divider

File Changes

1. src/rank_io.rs Enhancement, error handling, tests +279/-164

Refactor header reads and payload validation with contextual diagnostics

• Adds label parameter to check_payload_matches_file and splits validation into separate
 truncated/trailing-byte error cases
• Introduces truncated_field, read_exact_field, read_u8_field, read_u32_le, read_magic
 helpers with contextual error reporting
• Extracts payload-size calculation helpers: rank_payload_bytes, rankquant_payload_bytes,
 rankquant_bytes_per_vec, bitmap_payload_bytes
• Updates all probe and load functions to use new helpers and pass format labels through the call
 chain
• Adds test helpers assert_err_contains, rank_header, rankquant_header, bitmap_header,
 sign_bitmap_header
• Adds two new test cases: probe_reports_header_field_context_for_truncated_headers and
 probe_reports_distinct_payload_truncation_and_trailing_bytes_for_all_formats

src/rank_io.rs


2. tests/index/loader_validation.rs 🧪 Tests +123/-0

Add public loader malformed payload error context tests

• Adds helper functions assert_load_err_contains, rank_header, rankquant_header,
 bitmap_header, sign_bitmap_header for test setup
• Adds new test public_loaders_report_stable_malformed_payload_context that validates all four
 public loader paths (Rank::load, RankQuant::load, Bitmap::load, SignBitmap::load) report
 distinct truncated vs trailing-byte errors

tests/index/loader_validation.rs


3. tests/index/main.rs Tests, documentation +5/-4

Update test comments for exact-length validation semantics

• Updates comments in rank_io_loaders_reject_malformed_files_without_panicking test to reflect new
 exact-length validation behavior instead of relying on read_exact EOF
• Clarifies that truncated payloads are rejected by validation logic, not by allocation attempts

tests/index/main.rs


Grey Divider

Qodo Logo

@qodo-code-review

qodo-code-review Bot commented Jun 3, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (0) 🔗 Cross-repo conflicts (0)

Context used

Grey Divider


Remediation recommended

1. Hardcoded version in tests 🐞 Bug ⚙ Maintainability
Description
tests/index/loader_validation.rs forges headers with a literal version byte (1), so if the
persisted format VERSION changes these tests will start failing with “unsupported version” instead
of exercising the malformed-payload/trailing-byte diagnostics they intend to pin.
Code

tests/index/loader_validation.rs[R43-79]

Evidence
The integration tests hardcode the persisted version as 1, while the actual format version is
defined in src/rank_io.rs as const VERSION: u8 = 1;. This makes the integration tests fragile to
future version bumps and can invalidate the diagnostic assertions by failing earlier on version
mismatch.

tests/index/loader_validation.rs[43-79]
src/rank_io.rs[62-67]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Integration test header helpers hardcode the version byte (`v.push(1)`), which couples the tests to a specific persisted format version and causes misleading failures (“unsupported version”) if/when the library bumps `VERSION`.

### Issue Context
The library’s persisted index version is defined internally in `src/rank_io.rs` as `const VERSION: u8 = 1;`, and unit tests in that module already use `VERSION` when forging headers.

### How to fix
Avoid hardcoding the version in integration tests by deriving headers from real `write()` output, then mutating only the fields needed to create the malformed cases.

Concretely:
- For each format (TVR1/TVRQ/TVBM/TVSB), create an empty index via the public type (`Rank::new`, `RankQuant::new`, `Bitmap::new`, `SignBitmap::new`), write it to a temp file, and read back the first N header bytes.
- Modify the `n_vectors` field bytes in that header buffer to craft the “truncated payload” case (e.g., set `n_vectors=1` but don’t include payload).
- For “trailing bytes”, keep `n_vectors=0` and append an extra byte to the valid empty file bytes.

This keeps the tests aligned with whatever version byte the crate actually emits, without requiring `VERSION` to be made public.

### Fix Focus Areas
- tests/index/loader_validation.rs[43-79]
- src/rank_io.rs[62-67]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens persisted-index loader/prober diagnostics for malformed .tvr* formats by adding field-aware truncated-header errors and by splitting exact-length payload validation into distinct “truncated payload” vs “trailing bytes” failures. This improves operator-facing error context (notably for downstream consumers like ordgrep) while preserving existing size/allocation guards.

Changes:

  • Add contextual fixed-header reads so truncated magic/version/field reads identify the format + specific field.
  • Split exact-length validation into separate truncated-payload vs trailing-byte errors across TVR1/TVRQ/TVBM/TVSB.
  • Add/adjust tests to pin public loader and probe error strings for malformed headers/payload lengths.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
src/rank_io.rs Introduces field-aware header reads and more specific payload-length mismatch diagnostics; factors payload-size computations into shared helpers.
tests/index/loader_validation.rs Adds integration tests asserting stable, format-labeled error context for truncated payloads and trailing bytes across public loaders.
tests/index/main.rs Updates malformed-file fuzz test comments to reflect the new exact-length validation behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
@Fieldnote-Echo Nelson Spence (Fieldnote-Echo) force-pushed the codex/fail-closed-loader-contract branch from fb73040 to 5f76feb Compare June 3, 2026 14:23
Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
Signed-off-by: Nelson Spence <nelson@projectnavi.ai>

Copy link
Copy Markdown
Member Author

Addressed Qodo's hardcoded-version test finding in 4988948.

The integration malformed-payload helpers now derive empty-file headers through the public writers (Rank::write, RankQuant::write, Bitmap::write, SignBitmap::write) and mutate only the n_vectors field for the truncated-payload cases. That keeps the emitted version byte aligned with the crate writer without making the private VERSION constant public.

Local validation:

  • cargo fmt --all --check
  • cargo test --test index public_loaders_report_stable_malformed_payload_context
  • cargo test --test index
  • git diff --check

No unresolved inline Qodo thread was present to resolve; the finding was emitted as a PR conversation comment.

@project-navi-bot Navi Bot (project-navi-bot) merged commit e3bb89f into main Jun 3, 2026
36 checks passed
@project-navi-bot Navi Bot (project-navi-bot) deleted the codex/fail-closed-loader-contract branch June 3, 2026 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

persistence: fail-closed malformed index loader contract

3 participants