fix(engine): optimize survives a cross-process write race on the same table by ragnorc · Pull Request #297 · ModernRelay/omnigraph

ragnorc · 2026-06-22T09:21:38Z

What & why

A prod-discovered correctness bug (R2, while running #291's optimize): a direct CLI optimize racing a served-server write on the same table fails, with two signatures:

Rewrite … preempted by concurrent transaction Update at version N — optimize's Lance compact_files commit losing the OCC race.
stale view of 'node:Media': expected manifest table version 14 but current is 15 — optimize's manifest-publish equality CAS losing.

Root cause (validated in code + Lance 7.0.0 source): the data-table optimize path (optimize_one_table) trusted the in-process write queue and skipped a retry — but a CLI optimizer and the server are separate processes with separate WriteQueueManagers, so the queue serializes nothing between them. Lance cannot absorb the conflict either: commit_compaction commits a fixed Rewrite via apply_commit direct, there is no compaction RetryExecutor, and a semantic RetryableCommitConflict escapes commit_transaction's retry loop (io/commit.rs:979). So compaction fundamentally needs reopen + replan — which the internal-table path already did and the data path did not.

This is the maintenance op-class face of the same process-boundary flaw as the logical-class corruption in RFC §6.5. It needs the opposite fix, because compaction commutes (content-preserving): not strict OCC + a lock, but Lance rebase + reopen/replan + a monotonic manifest fast-forward. No writer_epoch.

The fix

Unify both compaction paths on one reopen+replan shape, with a two-level retry matching the two failure points:

Outer loop (reopen+replan): a genuine Lance Rewrite-vs-Update/Delete same-fragment overlap means our compaction did not commit — reopen at the new HEAD and re-plan. Lance rebases the common disjoint case (a concurrent insert/update/delete on other fragments) for free, so this fires only on real overlap.
Inner loop (Phase C, monotonic publish): the manifest advanced between our compaction and our publish. The compaction is committed at Lance HEAD N, so we must NOT reopen (that would trip the HEAD>manifest drift guard on our own work). Re-read the current manifest version C: if C ≥ N the manifest already includes our compaction (versions are linear) → no-op; else fast-forward to N. Monotonic, never the strict equality CAS that manufactured the conflict.

The Optimize recovery sidecar is written once and reused across attempts (every Phase-B commit is content-preserving, so recovery rolls the observed HEAD forward or safely rolls the compaction back). The in-process queue is kept — now an in-process contention reducer, not the cross-process guard. Shares COMPACTION_RETRY_BUDGET + is_retryable_lance_conflict with the internal path; adds is_retryable_manifest_conflict for the publish loop.

Design note (liability)

The lower-liability shape: the monotonic decision is kept local to optimize.rs and reuses the existing equality-CAS publish unchanged — no new publisher mode, no signature ripple to load/mutate/recovery/repair (they keep strict equality). One engine file changed.

This is the first correct instance of the maintenance-class commit model (RFC §6.6), not a parallel band-aid: when the planned step-5 PublishPlan unification lands, it relocates this (a TableAction::Rewrite carries the same commit model into the single publish authority) rather than reinventing it — the two regression tests are the contract that stays green across that refactor.

Tests (test-first, red→green)

optimize_survives_concurrent_insert_advancing_manifest
optimize_survives_concurrent_delete_before_compaction

Both reproduce the prod manifest-CAS signature via a two-handle (cross-process-simulated) race + the optimize.post_phase_b_pre_manifest_commit / new optimize.before_compact failpoints. RED in the first commit, GREEN after the fix.

Verification

cargo test --workspace --locked — green (65 suites, 0 failed).
maintenance / composite_flow / full failpoints (recovery) suites — no regressions.

Out of scope (tracked separately)

Persistent node:Media drift_needs_repair (a narrower index-op recovery deferral).
Blob compaction (node:Artifact) — upstream Lance blob-v2 blocker.
The full writer unification + writer_epoch for the logical class — RFC step 5.

Note

Medium Risk
Changes core optimize manifest publish and Lance retry behavior on a live maintenance path; regression tests and sidecar recovery mitigate, but incorrect drift classification could still strand HEAD/manifest gaps until repair.

Overview
Fixes a prod bug where CLI optimize and a served write in another process could fail on the same table: Lance Rewrite conflicts or a manifest publish equality CAS (expected X but current Y) because the in-process write queue does not serialize across processes.

optimize_one_table now uses a two-level retry: an outer reopen-and-replan loop on retryable Lance conflicts during Phase B (strip, compact, reindex, index build); an inner Phase C loop that monotonic fast-forwards the manifest (no-op when current >= compacted version, else CAS against fresh current) instead of pinning to the pre-compaction version. A head_advanced flag keeps own Phase-B HEAD-ahead-of-manifest drift from being treated as external drift (sidecar retained, publish instead of skipped_for_drift). The Optimize sidecar is written once and reused across reopen attempts. COMPACTION_RETRY_BUDGET is shared with the internal-table path; is_retryable_manifest_conflict and current_manifest_version support the publish loop.

Tests add cross-process-simulated races (concurrent insert/delete before compact), own-HEAD drift on reindex retry, and #[serial(optimize)] on related failpoint tests. RFC-013 §6.6 documents maintenance vs logical op-class commit models.

^{Reviewed by Cursor Bugbot for commit e9d16a2. Bugbot is set up for automated code reviews on this repo. Configure here.}

Greptile Summary

Fixes a prod correctness bug where optimize (CLI) racing a served writer on the same table could fail with either a Lance Rewrite-vs-Update OCC loss or a manifest equality-CAS mismatch — both caused by the in-process write queue not serialising across processes.

optimize_one_table gains a two-level retry: an outer reopen+replan loop for genuine Lance Rewrite overlap conflicts, and an inner Phase-C monotonic fast-forward replacing the strict equality-CAS publish (no-op when manifest ≥ N, else CAS against freshly read current). The recovery sidecar is written once and reused across attempts; head_advanced prevents the drift guard from misclassifying our own prior Phase-B commits as external drift.
Constant rename INTERNAL_COMPACTION_RETRY_BUDGET → COMPACTION_RETRY_BUDGET unifies the budget across both compaction paths; adds is_retryable_manifest_conflict and current_manifest_version helpers.
Three regression tests cover the concurrent-insert race, concurrent-delete race, and own-HEAD drift misclassification, each reproducing the prod failure signatures as RED before the fix and GREEN after.

Confidence Score: 5/5

Safe to merge; the change is mechanically narrow (one engine file, one test file, one RFC doc), all three new tests reproduce the prod failure signatures with explicit RED→GREEN proof, and the previous two blocking review threads have been fully addressed in this head commit.

The root cause is accurately identified and closed by design: commutative maintenance ops now use a monotonic fast-forward instead of an equality CAS, so a concurrent writer that builds on the compaction causes a convergent no-op rather than a spurious error. The outer reopen+replan loop mirrors the already-correct internal-table path. The head_advanced flag correctly distinguishes our own prior Phase-B commits from external drift, preventing the sidecar from being prematurely deleted on retry. The post-budget convergence check (re-read current before surfacing an error) plugs the Phase-C exhaustion gap from the prior thread. Recovery semantics are preserved: the sidecar is written before any HEAD-advancing op and deleted after successful publish, and its loose-match classifier handles the stale expected_version across retries as documented.

No files require special attention; optimize.rs change is self-contained and the affected code paths are all exercised by the new failpoint tests.

Important Files Changed

Filename	Overview
crates/omnigraph/src/db/omnigraph/optimize.rs	Core fix: adds outer reopen+replan loop and inner monotonic Phase-C fast-forward to optimize_one_table, replacing the single-shot equality-CAS publish; renames INTERNAL_COMPACTION_RETRY_BUDGET → COMPACTION_RETRY_BUDGET; adds is_retryable_manifest_conflict and current_manifest_version helpers; fixes post-budget exhaustion path to check convergence before surfacing error. Well-reasoned and correct by design.
crates/omnigraph/tests/failpoints.rs	Adds three regression tests (concurrent insert, concurrent delete, own-HEAD drift misclassification) plus wait_for_sidecar polling helper; adds #[serial(optimize)] to existing optimize test. Test logic is sound and covers the two key failure signatures from the prod incident.
docs/dev/rfc-013-write-path-latency.md	Adds §6.6 documenting the two op-class distinction (maintenance vs logical write), the root cause, the fix, and its relationship to the planned step-5 unification; adds a table row marking the maintenance-class commit model as LANDED. Clear and accurate.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant CLI as CLI optimize (separate process)
    participant SRV as Served writer (same table)
    participant Lance as Lance dataset
    participant Manifest as __manifest

    CLI->>CLI: acquire in-process write queue
    CLI->>Lance: "open_dataset_head_for_write (HEAD=V₀)"
    CLI->>Manifest: "fresh_snapshot → expected_version=V₀"
    CLI->>CLI: "Phase A — write sidecar (expected=V₀, head_advanced=false)"
    CLI->>CLI: failpoint: optimize.before_compact (pause)
    SRV->>Lance: "commit insert/delete → HEAD=V₀+1"
    SRV->>Manifest: publish V₀+1
    CLI->>CLI: failpoint released
    CLI->>Lance: "compact_files (on ds@V₀, Lance rebases over V₀+1)"
    Lance-->>CLI: Ok(metrics) — HEAD now V₀+3 (reserve+rewrite)
    CLI->>CLI: "head_advanced=true"
    CLI->>Lance: optimize_indices, build_indices
    CLI->>Manifest: "Phase C: read current=V₀+1"
    Note over CLI,Manifest: current(V₀+1) < state.version(V₀+3) → fast-forward
    CLI->>Manifest: "CAS expected=V₀+1 → publish V₀+3"
    Manifest-->>CLI: Ok
    CLI->>CLI: Phase D — delete sidecar

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant CLI as CLI optimize (separate process)
    participant SRV as Served writer (same table)
    participant Lance as Lance dataset
    participant Manifest as __manifest

    CLI->>CLI: acquire in-process write queue
    CLI->>Lance: "open_dataset_head_for_write (HEAD=V₀)"
    CLI->>Manifest: "fresh_snapshot → expected_version=V₀"
    CLI->>CLI: "Phase A — write sidecar (expected=V₀, head_advanced=false)"
    CLI->>CLI: failpoint: optimize.before_compact (pause)
    SRV->>Lance: "commit insert/delete → HEAD=V₀+1"
    SRV->>Manifest: publish V₀+1
    CLI->>CLI: failpoint released
    CLI->>Lance: "compact_files (on ds@V₀, Lance rebases over V₀+1)"
    Lance-->>CLI: Ok(metrics) — HEAD now V₀+3 (reserve+rewrite)
    CLI->>CLI: "head_advanced=true"
    CLI->>Lance: optimize_indices, build_indices
    CLI->>Manifest: "Phase C: read current=V₀+1"
    Note over CLI,Manifest: current(V₀+1) < state.version(V₀+3) → fast-forward
    CLI->>Manifest: "CAS expected=V₀+1 → publish V₀+3"
    Manifest-->>CLI: Ok
    CLI->>CLI: Phase D — delete sidecar

_{Reviews (3): Last reviewed commit: "fix(engine): optimize publish converges ..." | Re-trigger Greptile}

Two regression tests for the prod bug: a direct `optimize` process racing a served write on the same table fails, because the in-process write queue does not serialize across processes and the data-table optimize path has no retry. - optimize_survives_concurrent_insert_advancing_manifest: a concurrent insert advances the manifest while optimize is paused between compact and publish; optimize's equality-CAS publish then fails "expected X but current Y". - optimize_survives_concurrent_delete_before_compaction: a concurrent delete commits before optimize compacts; Lance rebases the compaction past it cleanly, so optimize again fails the publish CAS (the genuine Lance Rewrite-vs-Rewrite overlap is rarer and shares the internal path's retry). Both fail today with ExpectedVersionMismatch. Adds the `optimize.before_compact` failpoint seam + a wait_for_sidecar helper; serializes the optimize failpoint tests (shared failpoint name). The fix lands next.

… table The data-table optimize path trusted the in-process write queue and skipped a retry, so a CLI `optimize` racing a served write (separate processes = separate queues) failed: either the Lance Rewrite lost ("preempted by concurrent Update") or the manifest publish lost the strict equality CAS ("expected X but current Y"). Unify both compaction paths on the internal path's reopen+replan shape, with a two-level retry that matches the two failure points: - Outer loop (reopen+replan): a genuine Lance Rewrite-vs-Update/Delete same- fragment conflict means our compaction did not commit — reopen at the new HEAD and re-plan. Lance rebases the common disjoint case (a concurrent insert/delete on other fragments) for free, so this fires only on real overlap. - Inner loop (Phase C, monotonic publish): the manifest advanced between our compaction and our publish. The compaction is already committed at Lance HEAD N, so we must NOT reopen (that trips the HEAD>manifest drift guard on our own work). Re-read the current manifest version C: if C >= N the manifest already includes our compaction (versions are linear) — no-op; else fast-forward to N. Monotonic, not the strict equality CAS that manufactured the conflict. The Phase-A sidecar is written once and reused across reopen attempts (every Phase-B commit is content-preserving, so recovery rolls the observed HEAD forward or safely rolls the compaction back). The in-process queue is kept — it is now an in-process contention reducer, not the cross-process correctness guard. Shares the COMPACTION_RETRY_BUDGET constant + is_retryable_lance_conflict with the internal path; adds is_retryable_manifest_conflict for the publish loop. No writer_epoch. Turns the prior commit's two race tests green.

…rite race §6.6 records the maintenance vs logical op-class distinction (maintenance commutes → Lance rebase + reopen/replan + monotonic manifest fast-forward, no writer_epoch; logical → strict cross-process OCC + epoch) and the prod optimize-vs-served-write race that motivated it, now landed. Adds the matching mechanic row to §4.2.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 903ede2f95

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-22T09:26:21Z

+            Err(e) if attempt < COMPACTION_RETRY_BUDGET && is_retryable_lance_conflict(&e) => {
+                continue;


Preserve the sidecar when retrying after a Phase-B commit

When optimize_indices hits a retryable Lance conflict after compact_files has already succeeded, this continue re-enters the outer loop with Lance HEAD advanced by our own compaction while the manifest is still at the old version. The next iteration then takes the lance_head_version > expected_version drift branch, deletes the recovery sidecar as if the drift were external, and returns skipped_for_drift, leaving an uncovered HEAD>manifest gap that can no longer be recovered automatically. This also applies to upgraded graphs if the auto-cleanup config scrub commits before a later retryable conflict, so the retry path needs to distinguish/publish or roll back its own Phase-B commits instead of looping into the generic drift skip.

Useful? React with 👍 / 👎.

Same finding as the cursor High at optimize.rs:443 — confirmed and fixed in 56d004e0. The retry now tracks head_advanced; once one of our Phase-B ops (auto_cleanup strip / compact_files) has committed, a reopened HEAD > manifest is treated as our own sidecar-covered work (the monotonic Phase-C publish fast-forwards it), not external drift, so the sidecar is not deleted and skipped_for_drift is not returned. Regression test optimize_retry_does_not_misclassify_own_head_drift injects exactly the "compact commits, reindex conflicts" sequence you describe.

Review catch on the cross-process optimize fix: the outer retry loop re-ran the `lance_head > manifest` drift guard every iteration. After a partial Phase-B commit (the auto_cleanup strip or compaction commits, then a later op hits a retryable conflict), the reopened attempt saw HEAD ahead of the manifest — from OUR own sidecar-covered work, not an external writer — and deleted the sidecar + returned `skipped_for_drift`, stranding uncovered drift that then needs `repair`. Track `head_advanced` (did one of our Phase-B ops already commit). The drift guard now fires only when `!head_advanced` (genuine pre-existing external drift); once we have advanced HEAD, a reopened HEAD>manifest is our work that the monotonic publish fast-forwards. The no-op early-return likewise publishes prior committed work instead of dropping it when `head_advanced`. Regression test `optimize_retry_does_not_misclassify_own_head_drift` injects one retryable reindex conflict after the compaction commits (new `optimize.inject_ reindex_conflict` seam); red→green verified by negative control (reverting the gate reproduces `skipped_for_drift: Some(DriftNeedsRepair)`). Also de-flake `optimize_survives_concurrent_insert_advancing_manifest`: pause at `before_compact` (not post-compact) so the concurrent insert lands while HEAD== manifest — otherwise it could race optimize's committed-but-unpublished compaction and hit the write-path "HEAD ahead of manifest" guard.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 56d004e. Configure here.}

Review catch (greptile): the monotonic Phase-C publish loop returned an error on its final iteration's retryable manifest conflict, even though that conflict can itself mean a concurrent writer published a version that already includes our (content- preserving) compaction — i.e. the postcondition ("the manifest reflects our compaction") is already met. Recovery covered it (no data loss), but the operator saw a spurious error and had to re-run. Restructure the loop to re-read `current` on every retryable conflict and, on budget exhaustion, do a final `current >= state.version` convergence check before surfacing the error — the §6.6 "postcondition is the state, not winning the CAS" principle. Factor the repeated current-version read into `current_manifest_version`.

ragnorc added 3 commits June 22, 2026 10:53

cursor Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread crates/omnigraph/src/db/omnigraph/optimize.rs

chatgpt-codex-connector Bot reviewed Jun 22, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread crates/omnigraph/src/db/omnigraph/optimize.rs Outdated

Comment thread crates/omnigraph/src/db/omnigraph/optimize.rs Outdated

Comment thread crates/omnigraph/tests/failpoints.rs

cursor Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread crates/omnigraph/src/db/omnigraph/optimize.rs

ragnorc merged commit 6d4606a into main Jun 22, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(engine): optimize survives a cross-process write race on the same table#297

fix(engine): optimize survives a cross-process write race on the same table#297
ragnorc merged 5 commits into
mainfrom
fix-optimize-concurrency-race

ragnorc commented Jun 22, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 22, 2026

Uh oh!

ragnorc Jun 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		Err(e) if attempt < COMPACTION_RETRY_BUDGET && is_retryable_lance_conflict(&e) => {
		continue;

Conversation

ragnorc commented Jun 22, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What & why

The fix

Design note (liability)

Tests (test-first, red→green)

Verification

Out of scope (tracked separately)

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

ragnorc Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ragnorc commented Jun 22, 2026 •

edited by greptile-apps Bot

Loading