write-path cost gate + opener bypass by ragnorc · Pull Request #288 · ModernRelay/omnigraph

ragnorc · 2026-06-19T18:17:16Z

What & why

Write-path table opens routed through Lance's DatasetBuilder::from_namespace, whose describe_table opened the whole dataset just to return a location and then re-resolved the latest version — an O(commit-history depth) double latest-resolution per table open that missed Lance's O(1) version-hint fast path. On an object store this dominated write/branch latency (~70%, measured; RFC-013 §2.4) and got worse as a graph aged. Reads already bypassed this (PR #268); this completes the same open-by-location migration on the write side.

This PR lands RFC-013 step 1 (the cost gate + shared harness) and step 3a (the opener bypass — the dominant fix).

What's in this PR

perf(engine) — opener bypass + namespace retirement

TableStore::open_dataset_head_for_write now delegates to the direct opener (Dataset::open by URI + checkout_branch, routed through the tracked opener so cost tests can count it; a no-op in production). ensure_expected_version still validates head == pinned for strict ops.
With both reads (perf(engine): remove the per-query metadata re-derivation tax on warm reads #268) and now writes bypassing it, nothing in production routes through the per-table Lance namespace. The dead open chain (load_table_from_namespace, open_table_head_for_write) is deleted and the StagedTableNamespace contract apparatus is retired to a #[cfg(test)] mod (kept only to validate the LanceNamespace contract). __manifest commit coordination (GraphNamespacePublisher) is a separate component and is unaffected.

test(engine) — cost-budget gate on a shared, path-classifying harness

New tests/helpers/cost.rs: a store-agnostic harness (IoCounts/StagedCounts, measure/measure_with_staged, assert_flat/assert_grows, local_graph/s3_graph) — the single home for the IO-counting plumbing. Its data-table wrapper is a PrefixCounter that attributes each read to the opener term (_versions//.manifest) vs the scan term (data//*.lance), so cost tests isolate the opener directly.
write_cost.rs (local, every-PR): the internal-table scan term flat in depth (a RED #[ignore]'d LOCK — step 2's red→green target) + green guards (bounded data writes; a per-write read-op ceiling; a keyed insert routes through stage_merge_insert once) + a validator proving the opener/scan split (opener flat, scan grows).
write_cost_s3.rs (bucket-gated, RustFS CI): asserts data_opener_reads flat across depth — the proof of the step-3a win on a real object store. Wired into the rustfs_integration job + path filter.
warm_read_cost.rs migrated onto the shared harness (removing its duplicate local probes()).

docs — RFC-013 §9 marks step 1 / 3a landed and the namespace retirement; §7.1/§6/§11/§5.5/§10 correct the cross-table write-skew analysis (a prototype proved the scoped fix is a no-op without a shared graph_head contention row → coupled to Phase 7; concurrent-face vs sequential-face split). testing.md documents the harness + the local-vs-S3 backend split.

Measured impact (RFC §2.4 `[M]`; the S3 gate verifies it in CI)

	per-write depth slope	depth-80ish edge write
Before	O(depth): data + internal	~1618–1720 ops
This PR (step 3a)	data term flat; internal still ~+5/depth	~593 ops (2.7×)
After step 2 (+3b)	flat	~198 ops (~8.7×), flat

Data-table opener: 31 + 12·depth → flat 4. The key shape change: per-write cost no longer grows with the data table's version history.
Object-store deployments only (local FS resolves latest with one read_dir either way); the win grows with graph history and with the number of tables a write touches (load/merge/branch amplify it).

Test status

cargo test --workspace --locked green locally (writes, branch-merge, fork-on-first-write, overwrite, recovery — the paths the namespace open was suspected to be load-bearing on; the suite refutes it).
write_cost_s3 verified green on RustFS — data_opener_reads flat across commit depth on a real object store (a prior run, pre-fix, failed here with opener grew 20 → 50, empirically confirming the fix is load-bearing).
PR-gating checks (aws-feature, agents-md, container, Greptile, Cursor Bugbot) all pass. The full Test Workspace + RustFS S3 suites are PR-skipped by deliberate CI design and run post-merge on main.

Follow-on PRs (separate, after this merges)

Step 2 — internal-table compaction (__manifest/_graph_commits into all_table_keys), which un-ignores write_cost.rs's internal-table LOCK and delivers the remaining ~3× / full flatness. Must land with the Q8 boundary watermark (Lance version-CAS is cleanup-resurrection-vulnerable, RFC §12 Q8).
Step 3b — WriteTxn (capture-once + pinned-version opens + handle reuse + shared Session), retiring the flat-46 schema-read constant.
§7.1 write-skew — concurrent face (read-set + shared graph_head, coupled to Phase 7) and sequential face (inbound-RI validation on node removal) — two separate fixes.
Step 1c — an always-on prod storage.ops span metric (the test gate is the durable guard).

See docs/dev/rfc-013-write-path-latency.md for the full design and sequencing.

Note

Medium Risk
Changes the hot write open path and removes production namespace opens; mitigated by full workspace tests, S3 cost gate in CI, and retained test-only namespace contract coverage.

Overview
This PR lands RFC-013 step 3a: write-path table opens no longer go through DatasetBuilder::from_namespace (which did O(commit-depth) latest-version resolution). TableStore::open_dataset_head_for_write now delegates to a direct open_dataset_head (Dataset::open by URI + branch checkout), routed through open_dataset_tracked so cost tests can count opens (no-op when probes are unset in production).

With reads already opening by location, production no longer uses the per-table Lance namespace. The dead chain (load_table_from_namespace, open_table_head_for_write) is removed; StagedTableNamespace and related manifest namespace wiring move behind #[cfg(test)] for contract tests only. GraphNamespacePublisher for __manifest is unchanged.

Testing: New tests/helpers/cost.rs centralizes measure / IoCounts / PrefixCounter (opener vs scan reads). write_cost.rs runs on local FS (green guards + a RED #[ignore] internal-table flatness lock for step 2). write_cost_s3.rs asserts data_opener_reads stay flat on S3. warm_read_cost.rs drops duplicate probe wiring. CI path filters and the RustFS job run write_cost_s3.

Docs: RFC-013 §9 marks steps 1/3a landed; testing.md and index.md document the harness and backend split.

^{Reviewed by Cursor Bugbot for commit c9183b3. Bugbot is set up for automated code reviews on this repo. Configure here.}

Greptile Summary

Completes RFC-013 steps 1 and 3a: the write path's O(depth) DatasetBuilder::from_namespace open is replaced by a direct Dataset::open-by-URI call (O(1)), and the Lance namespace layer is retired to #[cfg(test)]-only contract validation since nothing in production routes through it anymore.

table_store.rs: open_dataset_head_for_write now calls open_dataset_head (tracked direct open) instead of the namespace builder; table_key is retained for signature stability but silently discarded.
namespace.rs / manifest.rs: load_table_from_namespace and open_table_head_for_write deleted; entire namespace module gated to #[cfg(test)], leaving GraphNamespacePublisher as the only production path through the manifest layer.
Test harness: PrefixCounter splits data-table reads into opener vs scan terms; write_cost.rs runs every-PR on local FS; write_cost_s3.rs asserts data_opener_reads flat across depth on a real object store — the step 3a acceptance test.

Confidence Score: 5/5

Safe to merge — the write-path change is a drop-in open strategy swap backed by a passing test suite and an S3 cost gate confirming the fix is load-bearing.

The core change replaces the namespace-builder open with a direct URI open that already existed and was proven on the read side; the namespace retirement is mechanical and consistently gated at the module level. The test harness provides both local every-PR guards and a real-S3 acceptance test.

No files require special attention.

Important Files Changed

Filename	Overview
crates/omnigraph/src/table_store.rs	Core fix: `open_dataset_head_for_write` now delegates to `open_dataset_head` (direct URI, O(1)) instead of the lance-namespace builder. `table_key` retained as a dead param for signature stability.
crates/omnigraph/src/db/manifest/namespace.rs	Dead `load_table_from_namespace` and `open_table_head_for_write` functions removed; per-item `#[cfg(test)]` guards dropped since the whole module is now `#[cfg(test)]` in manifest.rs. Logically consistent.
crates/omnigraph/tests/helpers/cost.rs	New shared cost-test harness: `PrefixCounter` wrapper splits reads into opener vs scan terms; `measure`/`assert_flat`/`assert_grows`/`local_graph`/`s3_graph` give all cost tests a single vocabulary. S3 init/seed failures now panic when bucket is configured.
crates/omnigraph/tests/write_cost.rs	New local write cost gate: green guards (ceiling, bounded writes, merge-insert routing) + opener/scan split validator + a `#[ignore]`'d RED LOCK for the step-2 internal-table flatness target.
crates/omnigraph/tests/write_cost_s3.rs	S3 opener gate: asserts `data_opener_reads` flat across depth [10, 50] — the RFC-013 step 3a acceptance test. Skips when bucket unset; panics on misconfigured bucket.
.github/workflows/ci.yml	Adds `write_cost_s3.rs` and `table_store.rs`/`instrumentation.rs` to the RustFS CI path filter and wires in the new S3 cost-gate step.

_{Reviews (6): Last reviewed commit: "Merge branch 'main' into rfc-013-write-p..." | Re-trigger Greptile}

…e builder Write opens routed through DatasetBuilder::from_namespace, whose describe_table opened the whole dataset just to return a location and then re-resolved the latest version — an O(commit-depth) double latest-resolution per table open that missed Lance's O(1) version-hint fast path. On an object store this dominated write latency (~70%, RFC-013 section 2.4). TableStore::open_dataset_head_for_write now delegates to the direct opener (open_dataset_head: Dataset::open by URI + checkout_branch, routed through the tracked opener so cost tests can count it; a no-op in production). The manifest already holds every sub-table's location, so the namespace catalog lookup was redundant; ensure_expected_version still validates head == pinned for strict ops. This completes PR #268's open-by-location migration on the write side. With both reads (PR #268) and now writes bypassing it, nothing in production routes through the per-table Lance namespace. The dead open chain (load_table_from_namespace, open_table_head_for_write) is deleted and the StagedTableNamespace contract apparatus is gated #[cfg(test)], mirroring the already-test-only read namespace; __manifest commit coordination (GraphNamespacePublisher) is a separate component and is unaffected. See docs/dev/rfc-013-write-path-latency.md sections 2.4 and 9 (step 3a).

Adds tests/helpers/cost.rs, a store-agnostic cost harness (IoCounts/StagedCounts, measure/measure_with_staged, assert_flat, local_graph/s3_graph) that the read-side warm_read_cost.rs, write_cost.rs, and write_cost_s3.rs share, so the IOTracker / task-local plumbing lives in exactly one place instead of duplicated per test. write_cost.rs (local, every-PR) gates the internal-table scan term flat in commit-history depth (a RED #[ignore]'d LOCK, the acceptance for bringing the internal tables into compaction) plus green guards: a single insert's data writes are bounded, a per-write read-op ceiling fails the moment a round-trip is added, and a keyed insert routes through stage_merge_insert once with no stage_append or vector-index build. write_cost_s3.rs (bucket-gated, rustfs CI) gates the data-table opener term flat across depth — the object-store-RPC phenomenon local FS cannot reproduce, and the red->green proof of the opener bypass. Wired into the rustfs_integration CI job and its path filter. Guards the "hot-path cost is bounded by work, not history" invariant on writes. See docs/dev/rfc-013-write-path-latency.md section 5.1, docs/dev/testing.md.

…t map - Section 9: mark step 1 (gate + harness) and step 3a (opener bypass) landed; record the per-table namespace retirement to test-only and the corrected measurement note (the opener win is S3-only; the local data-table growth is the merge-insert/RI fragment scan, a compaction term, not the opener). - Sections 7.1/6/11/5.5/10: correct the cross-table write-skew analysis after a prototype proved the scoped expected-set fix is a no-op against the per-object_id manifest (disjoint writers never share a row, so Lance never conflicts, the publisher never retries, and the expected check is a non-atomic pre-check evaluated once against stale state). The fix needs a shared contention row (Phase-7 graph_head / a minimal head row / commit-time re-validation), so it is coupled to that row, not standalone; that contention is load-bearing for correctness, not a drawback. Split the concurrent face (read-set + head) from the sequential face (inbound-RI validation on node removal) -- two different fixes. - testing.md: add write_cost.rs / helpers/cost.rs / write_cost_s3.rs to the test map; document the local-vs-S3 backend split; extend the cost-budget checklist item to the write/open path and point at the shared harness.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 65195c9903

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-19T18:24:13Z

+
+    // The data-table opener is O(1) after step 3a (direct-by-URI). Slack absorbs
+    // object-store variance; the pre-3a builder grew this ~+12/depth (RFC §2.4 [M]).
+    assert_flat(&curve, |c| c.data_reads, 8, "S3 data-table opener");


Isolate opener IO before asserting flatness

With this insert_person/commit_many fixture, data_reads is not opener-only: the table wrapper is attached to the Dataset used by stage_merge_insert, so reads from scanning unoptimized fragments are included as history grows. In the S3 RustFS gate this assertion can fail (or misattribute fragment-scan growth to the opener) even when the direct opener is flat; isolate _versions/open IO, compact the fixture, or use an append-only operation if the intent is to gate only the opener term.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-19T18:24:13Z

+    let mut db = Omnigraph::init(&uri, TEST_SCHEMA).await.ok()?;
+    load_jsonl(&mut db, TEST_DATA, LoadMode::Overwrite).await.ok()?;


Fail configured S3 setup errors instead of skipping

When OMNIGRAPH_S3_TEST_BUCKET is set (the RustFS CI job this PR wires up), these .ok()? conversions turn any real Omnigraph::init or seed-load failure into None, so write_cost_s3 logs a skip and passes without exercising the new gate. The existing s3_storage pattern only skips when the bucket env var is absent and then unwraps setup failures; this helper should likewise return a Result or panic once the bucket is configured.

Useful? React with 👍 / 👎.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit a966271. Configure here.}

… setup errors Addresses two PR review findings on the bucket-gated write_cost_s3 gate: - The data-table opener was not isolated: `data_reads` also counts the merge-insert/RI scan, which reads O(fragment-count) and so grows with history for a different reason (compaction's domain, not the opener) -- the same term that made the local data-table count grow. The flat assertion would false-RED or misattribute scan growth to the opener on rustfs. Fix: compact (db.optimize) before each measurement so the table holds ~1 fragment, bounding the scan and leaving the opener's latest-version resolution as the only history-varying term. Compaction preserves version history, so the opener still faces a deep _versions/ chain -- the thing under test. - s3_graph used `.ok()?`, so when OMNIGRAPH_S3_TEST_BUCKET was set but the store was down/misconfigured, init/seed failures collapsed to None and the gate skipped + passed vacuously. Fix: skip only when the bucket env var is absent; once it is set, init/seed failures panic (mirrors tests/s3_storage.rs).

…ct-by-design) Replaces the fixture-bounded isolation (compact-before-measure) from the prior commit with the root fix: a path-classifying ObjectStore wrapper (PrefixCounter) that attributes each data-table read to the opener term (_versions/.manifest) vs the scan term (data/*.lance). IoCounts now exposes data_opener_reads / data_scan_reads, so write_cost_s3 asserts the opener flat *directly* -- no compaction or fixture massaging, and the assertion measures the opener, not the conflated total. Closes the "harness conflates two IO terms" class: any cost test (read or write) can now isolate the opener. PrefixCounter implements only the object_store 0.13 core ObjectStore methods; the convenience surface (get/put/head/...) routes through get_opts/put_opts via ObjectStoreExt's blanket impl, so every read/write is still counted. Validated locally (every-PR) by write_cost::data_table_reads_split_into_flat_opener_ and_growing_scan: opener stays flat (7 -> 3) while scan grows (11 -> 91) and opener + scan == data_total exactly -- proving the classifier and confirming the local data-table growth is the fragment scan, not the opener. warm_read_cost (12 tests) stays green under the shared-harness change.

…st) noise Branch self-review (no behavior change) — pay down three liabilities the write-path work left: - warm_read_cost.rs kept its own probes() (three IOTrackers + a QueryIoProbes + a probe counter) and read raw .stats().read_iops — duplicating the shared helpers::cost harness this branch introduced. Migrated all 12 tests onto measure()/IoCounts; deleted the local probes(). (This also makes IoCounts' version_probes field used rather than dead.) - insert_cost was copy-pasted verbatim into write_cost.rs and write_cost_s3.rs. Hoisted to helpers::cost::measure_insert so the measured write is defined once. - The per-table Lance namespace (namespace.rs) became entirely test-only after step 3a, but was gated with ~22 per-item #[cfg(test)] attributes. Collapsed to a single `#[cfg(test)] mod namespace;` and stripped the per-item attributes; merged the import groups the gating had split. Verified: lib in-source 162 passed; write_cost 4 + warm_read_cost 12 passed; forbidden_apis passed.

…#291) * feat(engine): compact the internal __manifest/_graph_commits tables in optimize `optimize` iterated node/edge catalog tables only, so the two internal system tables (`__manifest`, `_graph_commits`) accumulated one fragment per commit and were never compacted -- making every write's metadata scan O(fragments), which grows forever on a long-lived graph (RFC-013 step 2). `optimize_all_tables` now also compacts both internal tables via a new `compact_internal_table`. They are not catalog-tracked (readers open them at their latest Lance HEAD), so it is a much simpler path than `optimize_one_table`: compact in place, no manifest publish (nothing to publish to), no recovery sidecar (a single atomic Lance commit -- no HEAD-before-publish gap), and no optimize_indices (they carry no Lance index, only object_id's unenforced-PK metadata). No application lock: Lance's compact_files auto-retries its Rewrite against any concurrent writer (the canonical LanceDB pattern; Rewrite vs Append is compatible, vs Update a retryable same-fragment conflict Lance rebases), and a coordinator refresh afterwards makes the warm handle observe the compacted HEAD. Compacts both tables even though Phase 7 (iss-991) will later fold _graph_commits into __manifest -- a one-call throwaway for the full interim win; __manifest compaction is also the prerequisite for Phase 7's graph_head contention. Cleanup (version GC) of the internal tables is deliberately NOT included here: it needs the Q8 cleanup-resurrection watermark first (deferred). maintenance.rs: optimize now returns 6 stats (4 data + 2 internal); adds optimize_compacts_internal_tables (sheds fragments, leaks no recovery sidecar, graph coherent for reads + strict writes after). * test(engine): un-ignore the internal-table scan LOCK (step 2 acceptance) `internal_table_scans_are_flat_in_history` was the RED, #[ignore]'d acceptance gate staged in PR #288. With internal-table compaction landed, a write's __manifest/_graph_commits scan is flat in commit-history depth on a compacted graph (measured __manifest 4->2, _graph_commits 7->3 across depth 10->100, vs the pre-step-2 RED 34->214 / 29->207). The test now compacts at each depth before measuring and runs green every-PR. * docs: RFC-013 step 2 internal-table compaction landed - invariants.md: close the compaction half of the read-path-rederivation known gap (optimize now compacts the internal tables; cleanup half still deferred). - maintenance.md: optimize covers __manifest/_graph_commits (no publish, no sidecar); not yet in cleanup. - rfc-013 §9: split step 2 into 2a (compaction, landed) and 2b (cleanup + Q8 watermark, deferred — debated; MTT-overlap + hot-path liability). - testing.md: the internal-table LOCK is now green every-PR. * fix(engine): guard absent _graph_commits + always compact internal tables Addresses PR #291 review findings: - Greptile (P1): optimize unconditionally opened `_graph_commits` for compaction, but a graph can validly have none (the coordinator opens it as `Option`, gated on `storage.exists`, for graphs predating the commit graph). `Dataset::open` on the absent table errored and failed the whole optimize. Guard the `_graph_commits` compaction with the same `storage_adapter().exists()` check the coordinator uses; `__manifest` always exists so it stays unguarded. Regression test `optimize_tolerates_absent_graph_commits_table` (empty graph so no publish recreates the table before the guard). - Cursor (low): the `table_tasks.is_empty()` early return skipped internal-table compaction for a schema with no node/edge types. Removed it so the internal tables are compacted regardless of the data-table set. - Codex (auto-cleanup, P1): documented — `compact_files` commits with a default `CommitConfig` (no skip_auto_cleanup) and `CompactionOptions` exposes no override, so on a graph storing an *on* auto_cleanup config the commit would fire version GC. Both internal tables are created with `auto_cleanup: None`, so new graphs are safe; the only exposure is pre-fix upgraded graphs, identical to the existing data-table optimize path, with step 2b's watermark as the comprehensive guard. Added a comment in `compact_internal_table` recording this. * fix(engine): retry publish on RetryableCommitConflict (compaction vs publish) Step 2 compacts `__manifest` with no app-level lock (Lance OCC arbitrates, validated against LanceDB + the lance-7.0.0 conflict resolver). compact_files' `Operation::Rewrite` auto-retries 20x (CommitConfig default num_retries=20), so a live publish usually wins the race and the compaction rebases. But the publish runs its merge-insert with conflict_retries(0) = one rebase attempt; if the compaction commits first AND the merge touched a fragment the Rewrite rewrote, Lance preempts the publish with `Error::RetryableCommitConflict` — a DIFFERENT variant from the row-level `TooMuchWriteContention` the publisher already retries. Left unhandled, that surfaces a transient error to the caller, i.e. a maintenance compaction (physical op) failing a live write (logical op) — invariant 7. Map `LanceError::RetryableCommitConflict` to a new `ManifestConflictDetails::RetryableCommitConflict` and treat it as retryable in the publisher's outer loop (reload fresh state + re-merge), alongside RowLevelCasContention. `ExpectedVersionMismatch` still propagates (a genuine expectation break must not be blindly retried). This also hardens multi-process concurrent writers generally, not just compaction. Normal publishes are insert-only (new object_ids -> new fragments, disjoint from rewritten old ones), so the conflict is rare; the guard covers the same-fragment-update edge and multi-process writers. Unit tests in publisher.rs pin the mapping + the retry-predicate contract. * revert: publisher RetryableCommitConflict handling (it was the wrong side) Reverts d138902. Validated against lance-7.0.0: the publisher's merge-insert runs with conflict_retries(0), and execute_with_retry converts an exhausted retryable commit conflict to TooMuchWriteContention before the caller sees it (write/retry.rs ~95-130). So map_lance_publish_error NEVER receives RetryableCommitConflict from merge_rows — it receives TooMuchWriteContention, which the publisher already maps to RowLevelCasContention and retries. The reverted mapping was therefore dead on the real path and its unit test was synthetic. The actual exposure is the *compaction* side: compact_files -> commit_compaction -> apply_commit directly (no execute_with_retry), so a Rewrite-vs-Merge check_txn conflict propagates raw and optimize can fail on a live graph. That is fixed app-side in compact_internal_table in the following commit. * fix(engine): make internal-table compaction correct by construction Address three findings from review of the step-2 internal-table compaction: - Non-destructive by construction: before compacting an internal table, strip any stored `lance.auto_cleanup.*` config off it. `compact_files` commits with a default `CommitConfig` (skip_auto_cleanup=false) and `CompactionOptions` exposes no override, so on a graph created by an older binary (on-by-default GC hook) the compaction commit would fire Lance's auto-cleanup and silently prune `__manifest`-pinned versions. Current binaries store no such config; the strip is the upgrade-path safety net so `optimize` can never GC versions. - App-level compaction retry: `compact_files` does NOT auto-retry a semantic conflict against a concurrent live writer (Rewrite vs Update/Merge/Delete propagates raw from apply_commit; Lance prescribes app-rerun). Wrap the internal-table compaction in a bounded retry loop that reopens fresh and replans on a retryable Lance conflict, so a maintenance compaction (a physical op) never fails a live write (a logical op) — invariant 7. - Compact all three internal tables, not two: `_graph_commit_actors` grows one fragment per commit on the authenticated write path, the same O(depth) scan as `__manifest`/`_graph_commits`. Drive the sweep from one source-of-truth list with per-table existence guards (the two commit-graph tables are optional). Make `graph_commit_actors_uri` pub(crate). Tests: the `internal_table_scans_are_flat_in_history` LOCK now runs the authenticated (actorful) write path so it covers `_graph_commit_actors` via the shared commit-graph IO wrapper (new `commit_many_as`/`measure_insert_as` helpers); `optimize_clears_stale_auto_cleanup_and_preserves_versions` pins the non-destructive guarantee (config cleared + no version GC); a unit test pins the retryable-conflict classifier; the empty-graph stats count is 7 (the actor table is created at init). * docs: internal-table compaction covers all 3 tables, non-destructive, retried Sync the RFC-013 step-2a section and the maintenance guide with the correctness-by-design refinements: - optimize compacts `__manifest`, `_graph_commits`, AND `_graph_commit_actors` (the actor table grows on the authenticated write path). - optimize is non-destructive by construction — it never GCs versions, and strips stale `lance.auto_cleanup.*` config so an upgraded graph's commit-time GC hook cannot fire during compaction. - internal-table compaction rebases and retries against concurrent live writers rather than failing the operator's optimize or the live write. - the cost LOCK is the authenticated-path acceptance test. * fix(engine): refresh coordinator after a config-strip with no compaction work `compact_internal_table` returns early when `plan_compaction` finds no work, but `clear_stale_auto_cleanup_config` may have already committed a config-strip that advanced Lance HEAD. The early return skipped the coordinator refresh that the successful-compaction path performs, leaving warm `__manifest`/commit-graph handles pinned to the pre-strip version until the next read's version probe healed them. No correctness bug (the probe self-heals, and a stale-handle write would retry via publisher CAS), but the refresh makes coherence deterministic rather than probe-dependent. Refresh iff the config-strip actually committed. * docs(engine): correct compact_internal_table doc — compact_files does not auto-retry The function doc claimed "Lance's compact_files auto-retries its Operation::Rewrite against any concurrent writer" — wrong, and contradicting the is_retryable_lance_conflict doc just below it and the explicit retry loop that exists precisely because compact_files does NOT auto-retry semantic conflicts (Rewrite vs Update/Merge/Delete propagates raw through apply_commit). Also move the orphaned description from above the retry-budget const onto the function, and include the third internal table. * test(engine): optimize must clear stale auto_cleanup on DATA tables too (red) Regression test for a destructive bug on the data-table optimize path: on an upgraded graph whose node/edge table still carries pre-v7 lance.auto_cleanup.* config, `optimize`'s compact_files/optimize_indices commits fire Lance's version GC and prune __manifest-pinned data-table versions. Mirrors the internal-table auto_cleanup test on a Person table (force-repair realigns the config-induced drift so optimize doesn't skip the table). Red against the current code: the data-table path does not strip the config. The fix lands in the next commit. * fix(engine): clear stale auto_cleanup on the data-table optimize path too The auto_cleanup scrub previously only protected the internal tables; the data-table path (optimize_one_table) ran compact_files/optimize_indices with a default CommitConfig (skip_auto_cleanup=false) and no override, so on an upgraded graph those commits could fire Lance's version-GC hook and prune __manifest-pinned node/edge versions — making the "non-destructive" contract false for data tables. Strip the config before the HEAD-advancing commits, capturing version_before first so the strip's own commit still triggers the Phase-C manifest publish (no uncovered drift). No retry loop needed: the data-table path holds the per-table write queue. Covered by the existing Optimize recovery sidecar. Turns the prior commit's test green. Also: switch clear_stale_auto_cleanup_config off the deprecated delete_config_keys to update_config(None values), and correct two now-inaccurate doc comments — compaction is "one or more content-preserving commits" (compact_files can emit a ReserveFragments before the Rewrite), not "a single atomic commit"; the sidecar-free property rests on content-preservation + read-at-HEAD, not single-commit atomicity. * docs: optimize is non-destructive on all tables; correct atomicity/retry claims - non-destructive guarantee now spans data + internal tables (the auto_cleanup strip runs on both paths), not just the internal ones. - "single atomic Lance commit" was inaccurate: compaction can emit a ReserveFragments commit before the Rewrite; the no-sidecar property rests on content-preservation + read-at-HEAD, not single-commit atomicity. - "retries rather than failing" softened to the truth: a *bounded* retry on the internal path; sustained contention surfaces a loud conflict error (bounded + observable, not an infinite loop). The data path holds the per-table queue and never contends.

…rency-cap correction + Lance-metadata comparison (#292) * feat(engine): compact the internal __manifest/_graph_commits tables in optimize `optimize` iterated node/edge catalog tables only, so the two internal system tables (`__manifest`, `_graph_commits`) accumulated one fragment per commit and were never compacted -- making every write's metadata scan O(fragments), which grows forever on a long-lived graph (RFC-013 step 2). `optimize_all_tables` now also compacts both internal tables via a new `compact_internal_table`. They are not catalog-tracked (readers open them at their latest Lance HEAD), so it is a much simpler path than `optimize_one_table`: compact in place, no manifest publish (nothing to publish to), no recovery sidecar (a single atomic Lance commit -- no HEAD-before-publish gap), and no optimize_indices (they carry no Lance index, only object_id's unenforced-PK metadata). No application lock: Lance's compact_files auto-retries its Rewrite against any concurrent writer (the canonical LanceDB pattern; Rewrite vs Append is compatible, vs Update a retryable same-fragment conflict Lance rebases), and a coordinator refresh afterwards makes the warm handle observe the compacted HEAD. Compacts both tables even though Phase 7 (iss-991) will later fold _graph_commits into __manifest -- a one-call throwaway for the full interim win; __manifest compaction is also the prerequisite for Phase 7's graph_head contention. Cleanup (version GC) of the internal tables is deliberately NOT included here: it needs the Q8 cleanup-resurrection watermark first (deferred). maintenance.rs: optimize now returns 6 stats (4 data + 2 internal); adds optimize_compacts_internal_tables (sheds fragments, leaks no recovery sidecar, graph coherent for reads + strict writes after). * test(engine): un-ignore the internal-table scan LOCK (step 2 acceptance) `internal_table_scans_are_flat_in_history` was the RED, #[ignore]'d acceptance gate staged in PR #288. With internal-table compaction landed, a write's __manifest/_graph_commits scan is flat in commit-history depth on a compacted graph (measured __manifest 4->2, _graph_commits 7->3 across depth 10->100, vs the pre-step-2 RED 34->214 / 29->207). The test now compacts at each depth before measuring and runs green every-PR. * docs: RFC-013 step 2 internal-table compaction landed - invariants.md: close the compaction half of the read-path-rederivation known gap (optimize now compacts the internal tables; cleanup half still deferred). - maintenance.md: optimize covers __manifest/_graph_commits (no publish, no sidecar); not yet in cleanup. - rfc-013 §9: split step 2 into 2a (compaction, landed) and 2b (cleanup + Q8 watermark, deferred — debated; MTT-overlap + hot-path liability). - testing.md: the internal-table LOCK is now green every-PR. * fix(engine): guard absent _graph_commits + always compact internal tables Addresses PR #291 review findings: - Greptile (P1): optimize unconditionally opened `_graph_commits` for compaction, but a graph can validly have none (the coordinator opens it as `Option`, gated on `storage.exists`, for graphs predating the commit graph). `Dataset::open` on the absent table errored and failed the whole optimize. Guard the `_graph_commits` compaction with the same `storage_adapter().exists()` check the coordinator uses; `__manifest` always exists so it stays unguarded. Regression test `optimize_tolerates_absent_graph_commits_table` (empty graph so no publish recreates the table before the guard). - Cursor (low): the `table_tasks.is_empty()` early return skipped internal-table compaction for a schema with no node/edge types. Removed it so the internal tables are compacted regardless of the data-table set. - Codex (auto-cleanup, P1): documented — `compact_files` commits with a default `CommitConfig` (no skip_auto_cleanup) and `CompactionOptions` exposes no override, so on a graph storing an *on* auto_cleanup config the commit would fire version GC. Both internal tables are created with `auto_cleanup: None`, so new graphs are safe; the only exposure is pre-fix upgraded graphs, identical to the existing data-table optimize path, with step 2b's watermark as the comprehensive guard. Added a comment in `compact_internal_table` recording this. * docs(rfc-013): serial-hop correction — wall-clock is the ~110-hop backbone, not op count Latency-slope measurement on the deployed edge binary (f6d2cc0, steps 1+3a landed; rustfs + per-op latency proxy, depth 1..85) shows wall-clock is set by a ~110-hop SERIAL backbone that is depth-invariant. Total ops grow +~7/depth but PARALLELIZE (parallelism 1->6), so the depth term adds little wall-clock. - New §0(c): the serial-hop vs total-op finding + branch-op backbones (create ~77, delete ~87, branch-write ~258/1777-ops/21s floor = fork-on-first-write). - §2.4: correct the '1720->198 ops => 258s->30s' op-count->wall-clock conversion. - §5.1: promote serial-hop/num_stages to the PRIMARY latency LOCK; op-count flatness demoted to a cost/compute-floor gate. - §9 step 2: reprioritized as Phase-7 prerequisite + compute-floor/space, NOT the wall-clock fix; step 3b (parallel capture-once WriteTxn) is the headline latency lever; branch-write moved under step 3b + fork seam. - Summary: serial-backbone correction up front. Vindicates the §3/§4.1 design; corrects the op-count latency framing. * docs(rfc-013): concurrency-cap correction + Lance-metadata comparison Fold in two measured findings from the deployed edge binary (f6d2cc0) on rustfs behind a latency+concurrency proxy: - §0(d): concurrency-cap A/B. Under unlimited concurrency the internal-table scan parallelizes (backbone ~110); under an R2-realistic cap (8) it serializes and an UNCOMPACTED graph runs away (per-write ops 1273->3505, wall 6->16s), while #291's internal compaction cuts it ~6x and bounds it (137->1 frag). The latency model is (serial_hops + ops/effective_concurrency)*RTT + compute. - Reframe step 2 across Summary/§2.4/§9: NOT de-ranked — on R2 (capped) it is a primary latency lever + the anti-runaway fix + Phase-7 prereq. The earlier 'step 2 is parallel, irrelevant to latency' was an unlimited-concurrency artifact. Deployed f6d2cc0 optimize is node/edge-only; #291 (undeployed) is the prod win. - §5.1: the cost-gate ThrottledStore must cap concurrency AND inject latency; assert serial_hops flat AND ops flat in history. - §2.3 + §8: Lance/LanceDB comparison from 7.0.0 source — Lance metadata is a single-file per-version manifest read O(1) (latest_version_hint), pruned by default; omnigraph's __manifest-as-Lance-dataset scan is self-inflicted by the cross-table-atomicity choice. Adds explicit defense of Lance-dataset __manifest (MTT seam) vs a flat-file CAS'd manifest (cheaper, off the MTT path). Design (§3/§4.1) unchanged and vindicated; corrections are measurement framing, step sizing, and one design-choice that was implicit.

ragnorc added 5 commits June 19, 2026 17:24

docs(rfc): RFC-013 write-path latency design + index link

57c350a

Merge remote-tracking branch 'origin/main' into investigate-pasted-text

6d386c3

chatgpt-codex-connector Bot reviewed Jun 19, 2026

View reviewed changes

Merge branch 'main' into rfc-013-write-path-latency

a966271

cursor Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread crates/omnigraph/tests/helpers/cost.rs

ragnorc and others added 4 commits June 20, 2026 11:18

Merge branch 'main' into rfc-013-write-path-latency

c9183b3

ragnorc changed the title ~~RFC-013: write-path cost gate + opener bypass (step 1 + step 3a)~~ write-path cost gate + opener bypass Jun 20, 2026

ragnorc merged commit f6d2cc0 into main Jun 20, 2026
8 checks passed

ragnorc mentioned this pull request Jun 20, 2026

(feat): compact the internal manifest/commit-graph tables in optimize #291

Merged

aaltshuler mentioned this pull request Jun 20, 2026

docs(rfc): RFC-013 step 3b — capture-once WriteTxn (unify the write opener) #295

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

write-path cost gate + opener bypass#288

write-path cost gate + opener bypass#288
ragnorc merged 10 commits into
mainfrom
rfc-013-write-path-latency

ragnorc commented Jun 19, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 19, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 19, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		let mut db = Omnigraph::init(&uri, TEST_SCHEMA).await.ok()?;
		load_jsonl(&mut db, TEST_DATA, LoadMode::Overwrite).await.ok()?;

Conversation

ragnorc commented Jun 19, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What & why

What's in this PR

Measured impact (RFC §2.4 [M]; the S3 gate verifies it in CI)

Test status

Follow-on PRs (separate, after this merges)

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ragnorc commented Jun 19, 2026 •

edited by greptile-apps Bot

Loading

Measured impact (RFC §2.4 `[M]`; the S3 gate verifies it in CI)