Skip to content

[codex] Document persisted format compatibility#154

Merged
Navi Bot (project-navi-bot) merged 4 commits into
mainfrom
codex/persisted-format-contract
Jun 3, 2026
Merged

[codex] Document persisted format compatibility#154
Navi Bot (project-navi-bot) merged 4 commits into
mainfrom
codex/persisted-format-contract

Conversation

@Fieldnote-Echo

Copy link
Copy Markdown
Member

Summary

Closes #118.

  • Add docs/PERSISTED_FORMAT.md as the v1 compatibility contract for .tvr, .tvrq, .tvbm, and .tvsb persisted index artifacts.
  • Document IndexMetadata as a manifest-friendly byte-shape descriptor, plus an external segment manifest example.
  • Pin the versioning policy for byte layout, row invariants, primitive score semantics, unsupported versions, trailing bytes, and migration/rejection expectations.
  • Add standalone persisted-format compatibility tests with tiny committed byte expectations for Rank, RankQuant, Bitmap, and SignBitmap.
  • Cross-link the persisted format contract from README and provenance docs.

Scope Notes

This stays on origin/main and does not stack on PR #153. It intentionally does not pin malformed-loader diagnostic strings; it only documents rejection behavior. It also does not absorb #127 backend determinism/tie-order policy: the new contract covers primitive per-row score interpretation for identical persisted bytes, while composed retrieval policy remains separate.

I also checked /home/ndspence/GitHub/ordgrep-rs read-only. Its current loader path uses RankQuant::load and SignBitmap::load directly and binds those files through its own store manifest/row-count checks, which matches this PR's primitive-format boundary.

Validation

  • cargo fmt --check
  • cargo test --test persistence_compat
  • cargo test -p ordvec --all-features
  • cargo test --workspace --exclude ordvec-python --all-features
  • cargo check -p ordvec-python --all-features
  • cargo clippy --workspace --all-targets --all-features -- -D warnings
  • cargo deny check
  • git diff --check

Adversarial Review

A read-only adversarial review found one medium docs gap around score-semantics versioning. I fixed it by documenting that format_version also covers primitive score interpretation of persisted bytes, while equal-score ordering/backend policy remains out of the byte-format contract. The reviewer found no blockers in byte layout or fixture bytes after that scope was addressed.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request documents the compatibility contract and byte layouts for ordvec persisted index files in a new PERSISTED_FORMAT.md file, and links to it from existing documentation. It also introduces integration tests in tests/persistence_compat.rs to ensure the stability of the serialized formats. The review feedback suggests improving the tests by using an RAII helper to safely manage temporary files and prevent leaks during panics, and adding an explicit length check in a helper function to avoid potential out-of-bounds panics.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread tests/persistence_compat.rs Outdated
Comment thread tests/persistence_compat.rs
@codecov

codecov Bot commented Jun 2, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@qodo-code-review

Copy link
Copy Markdown

Review Summary by Qodo

Document persisted format compatibility contract and add fixture tests

📝 Documentation 🧪 Tests

Grey Divider

Walkthroughs

Description
• Add docs/PERSISTED_FORMAT.md documenting v1 compatibility contract for persisted index formats
• Define byte layouts, versioning policy, and metadata descriptor for .tvr, .tvrq, .tvbm,
  .tvsb artifacts
• Document score semantics versioning and breaking-change criteria for future format evolution
• Add standalone tests/persistence_compat.rs with committed byte fixtures for all four index types
• Validate version rejection, trailing-byte rejection, and metadata probing for each format
• Cross-link persisted format contract from README and INDEX_PROVENANCE.md
Diagram
flowchart LR
  A["Persisted Format Contract"] -->|defines| B["Byte Layouts<br/>Versioning Policy<br/>Score Semantics"]
  B -->|covers| C["Rank TVR1<br/>RankQuant TVRQ<br/>Bitmap TVBM<br/>SignBitmap TVSB"]
  D["Fixture Tests"] -->|validate| E["Header Parsing<br/>Payload Invariants<br/>Version Rejection<br/>Trailing Bytes"]
  E -->|ensure| F["Format Stability"]
  A -.->|referenced by| G["README<br/>INDEX_PROVENANCE"]

Loading

Grey Divider

File Changes

1. docs/PERSISTED_FORMAT.md 📝 Documentation +199/-0

Persisted format compatibility contract and layouts

• New documentation file establishing v1 compatibility contract for all four persisted index formats
• Defines fixed header layouts with magic values, format version, and format-specific parameters for
 each index type
• Documents metadata descriptor structure returned by probe_index_metadata() with example external
 segment manifest entry
• Specifies compatibility policy, breaking-change criteria, and score-semantics versioning tied to
 format version
• Distinguishes between probe validation (header and byte-count checks) and full loader validation
 (row invariants)

docs/PERSISTED_FORMAT.md


2. tests/persistence_compat.rs 🧪 Tests +202/-0

Persisted format compatibility fixture tests

• New test file with committed byte fixtures for Rank, RankQuant, Bitmap, and SignBitmap v1 formats
• Each test verifies fixture bytes match expected layout, validates metadata probing, and
 round-trips through load
• Tests reject unsupported format versions (version byte changed to 2) and trailing bytes for all
 four index types
• Helper functions for temporary file management, byte writing, metadata assertion, and
 version/trailing-byte rejection validation

tests/persistence_compat.rs


3. README.md 📝 Documentation +2/-1

Cross-link persisted format documentation

• Add cross-link to docs/PERSISTED_FORMAT.md in the provenance and threat model section
• Reorder documentation references to list persisted format contract before INDEX_PROVENANCE

README.md


View more (1)
4. docs/INDEX_PROVENANCE.md 📝 Documentation +3/-1

Reference persisted format contract

• Add introductory cross-reference to PERSISTED_FORMAT.md for byte layout and versioning policy
• Clarify that INDEX_PROVENANCE covers loader guarantees while PERSISTED_FORMAT covers byte-level
 contract

docs/INDEX_PROVENANCE.md


Grey Divider

Qodo Logo

@qodo-code-review

qodo-code-review Bot commented Jun 3, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (0) 🔗 Cross-repo conflicts (0)

Context used

Grey Divider


Remediation recommended

1. Clock-based temp filenames 🐞 Bug ☼ Reliability
Description
tests/persistence_compat.rs::tmp() derives filenames from wall-clock time and unwraps
duration_since(UNIX_EPOCH), which can panic on misconfigured clocks and can theoretically collide on
coarse-resolution timers. A panic/collision here would fail or flake the new persisted-format
compatibility test suite.
Code

tests/persistence_compat.rs[R11-22]

Evidence
The temp path helper uses SystemTime::now().duration_since(...).unwrap() and is used as the basis
for all files created in the new compatibility tests, so panics/collisions in tmp() directly
impact test stability.

tests/persistence_compat.rs[11-22]
tests/persistence_compat.rs[55-70]
tests/persistence_compat.rs[72-202]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`tmp()` builds temp file paths using `SystemTime::now().duration_since(UNIX_EPOCH).unwrap().as_nanos()`. This introduces an unnecessary panic edge case (system time before UNIX_EPOCH) and relies on clock resolution for uniqueness.

### Issue Context
These paths are used by multiple integration tests in `tests/persistence_compat.rs`, so any panic/collision can break/flakify CI.

### Fix Focus Areas
- tests/persistence_compat.rs[11-22]

### Suggested fix
Use a monotonic, process-local source of uniqueness instead of wall-clock time:
- Prefer `tempfile` (e.g., `tempfile::Builder::new().prefix(...).tempfile()` / `NamedTempFile`) and use its generated path.
- Or keep std-only: add a `static AtomicU64` counter and build filenames from `{pid}_{counter}` (optionally also include thread id). Remove the `.unwrap()` on `duration_since` entirely.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR establishes and tests a stable v1 compatibility contract for ordvec’s persisted index artifacts (.tvr, .tvrq, .tvbm, .tvsb), so downstream systems can safely cache/replicate/probe these files and know when bytes will be accepted vs rejected.

Changes:

  • Adds a new docs/PERSISTED_FORMAT.md document defining the v1 byte layouts, metadata contract (IndexMetadata), and compatibility/versioning policy.
  • Adds golden (non-roundtrip) fixture tests that assert exact persisted bytes for Rank, RankQuant, Bitmap, and SignBitmap, plus rejection of unsupported versions and trailing bytes.
  • Cross-links the persisted-format contract from README and provenance documentation.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
tests/persistence_compat.rs Adds golden byte fixtures + rejection tests to lock in v1 persisted-format compatibility behavior.
README.md Links to the persisted-format compatibility contract alongside provenance/threat-model docs.
docs/PERSISTED_FORMAT.md New canonical v1 persisted-format contract: layout tables, metadata meaning, versioning and score-semantics policy.
docs/INDEX_PROVENANCE.md Cross-links to the persisted-format contract for byte layout/versioning details.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6a1aaa2566

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

Comment thread docs/PERSISTED_FORMAT.md Outdated
Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
@Fieldnote-Echo Nelson Spence (Fieldnote-Echo) force-pushed the codex/persisted-format-contract branch from 6a1aaa2 to a1bc2be Compare June 3, 2026 14:23
Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
Signed-off-by: Nelson Spence <nelson@projectnavi.ai>

Copy link
Copy Markdown
Member Author

Addressed Qodo's clock-based temp filename finding in aa7969d4e03e649f61a53d516f8c344ec560a217.

tests/persistence_compat.rs::tmp() now uses a process-local AtomicU64 counter plus the process id for temp-file uniqueness, so it no longer depends on wall-clock time or duration_since(UNIX_EPOCH). The existing RAII TempFile cleanup remains in place.

Local validation:

  • cargo fmt --all --check
  • cargo test --test persistence_compat
  • git diff --check

The Qodo finding was emitted as a PR conversation comment rather than an unresolved inline review thread.

@project-navi-bot Navi Bot (project-navi-bot) merged commit a0db67e into main Jun 3, 2026
29 checks passed
@project-navi-bot Navi Bot (project-navi-bot) deleted the codex/persisted-format-contract branch June 3, 2026 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

persistence: stable segment metadata and file-format compatibility policy

3 participants