Skip to content

(feat): compact the internal manifest/commit-graph tables in optimize#291

Merged
ragnorc merged 14 commits into
mainfrom
rfc-013-internal-table-compaction
Jun 21, 2026
Merged

(feat): compact the internal manifest/commit-graph tables in optimize#291
ragnorc merged 14 commits into
mainfrom
rfc-013-internal-table-compaction

Conversation

@ragnorc

@ragnorc ragnorc commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

What & why

optimize compacted node/edge data tables only — the two internal system tables __manifest (visibility authority) and _graph_commits (lineage DAG) were never compacted (all_table_keys is catalog-keys-only). They accumulate one fragment per commit, so every write's metadata scan of them is O(fragments) and grows forever on a long-lived graph. That was the residual internal-table term after PR #288 flattened the data-table opener, and the RED #[ignore]'d internal_table_scans_are_flat_in_history LOCK that #288 staged.

This PR brings both internal tables into db.optimize().

Changes

  • compact_internal_table (new, in optimize.rs) — a deliberately simpler path than optimize_one_table. The internal tables aren't catalog-tracked (readers open them at their latest Lance HEAD), so compaction just advances that HEAD: no manifest publish (nothing to publish to), no recovery sidecar (a single atomic Lance compact_files commit — no HEAD-before-publish gap), no optimize_indices (they carry no Lance index, only object_id's unenforced-PK metadata). optimize_all_tables runs it for __manifest and _graph_commits after the data tables.
  • Concurrency: no application lock — validated against the canonical Lance consumer. LanceDB holds no lock around compaction; Lance's compact_files auto-retries its Operation::Rewrite against any concurrent writer (Rewrite vs Append is compatible, vs Update/merge-insert a retryable same-fragment conflict Lance rebases — lance/io/commit.rs, conflict_resolver.rs). A coordinator refresh after makes the warm __manifest/_graph_commits handle observe the compacted HEAD. (Within one process optimize &self and mutate &mut self can't overlap anyway; multi-process is the existing one-winner-CAS exposure.)
  • Both tables compacted, even though Phase 7 (iss-991) will later fold _graph_commits into __manifest — a one-call throwaway for the full interim win. __manifest compaction is also the hard prerequisite for Phase 7 (its graph_head CAS contention is only acceptable once compaction bounds the publisher's load_publish_state scan).

Measured result (the LOCK, now green)

internal_table_scans_are_flat_in_history (un-ignored): on a compacted graph a write's internal-table scan is flat in commit-history depth:

depth 10 depth 100
__manifest reads 4 2
_graph_commits reads 7 3

vs the pre-step-2 RED curve (__manifest 34→214, _graph_commits 29→207).

Scope — compaction only (cleanup deferred)

This is step 2a. The cleanup (version-GC) half — step 2b — is deliberately not here: cleanup is the version-deleting op that hits the Q8 cleanup-resurrection hole (Lance's version CAS has no monotonic guard, validated in lance/dataset/transaction.rs/cleanup.rs), so it must land with a durable monotonic watermark (a Lance boundary tag). Deferred because it touches the read/open path, is the part most likely made redundant by Lance MTT, and only buys the secondary version-count/space term — whereas 2a delivers the dominant per-write scan win with zero resurrection risk. So the internal _versions/ chain still grows until 2b; run optimize on a cadence to keep per-write metadata scans flat. See docs/dev/rfc-013-write-path-latency.md §9.

Tests

  • cargo test --workspace --locked green (65 suites). (One pre-existing flaky server test, schema_apply_route_hard_drops_property_with_allow_data_loss, intermittently fails under full parallel load — passes 4/4 in isolation, on clean main, and on the re-run; unrelated to this optimize-only change, which the server test never exercises.)
  • internal_table_scans_are_flat_in_history un-ignored and green; optimize_compacts_internal_tables (sheds fragments, no sidecar leak, coherent after); optimize_on_empty_graph updated (6 stats); idempotent test already covers internal no-op.

Note

Medium Risk
Changes the maintenance path and coordinator refresh behavior under concurrent writes, but compaction is non-destructive (no version GC) and is heavily covered by integration and cost-budget tests.

Overview
optimize now compacts internal system tables (__manifest, _graph_commits, _graph_commit_actors) after node/edge tables, via a new compact_internal_table path: in-place Lance compaction only—no __manifest publish and no recovery sidecar—plus coordinator refresh for warm handles. The early return when the schema has no data tables is removed so internal tables are still swept.

Safety and concurrency: clear_stale_auto_cleanup_config runs before compaction on internal tables and on data tables inside the existing optimize sidecar, so upgraded graphs cannot trigger Lance commit-time version GC during optimize. Internal compaction uses a bounded retry on retryable Lance write conflicts (no per-table write queue); missing _graph_commits / _graph_commit_actors datasets are skipped via storage existence checks.

Tests and gates: Integration tests cover internal compaction, absent commit-graph tables, and auto-cleanup stripping on internal and data tables. The internal_table_scans_are_flat_in_history cost LOCK is enabled and uses actorful writes with optimize at each depth checkpoint. User and RFC docs describe the new behavior; internal version cleanup remains deferred (step 2b).

Reviewed by Cursor Bugbot for commit edfc73d. Bugbot is set up for automated code reviews on this repo. Configure here.

Greptile Summary

This PR brings the two (now three) internal system tables — __manifest, _graph_commits, and _graph_commit_actors — into db.optimize() as RFC-013 step 2a, fixing the O(fragments) per-write metadata scan that grew unboundedly on long-lived graphs. It also un-ignores the internal_table_scans_are_flat_in_history cost LOCK and adds clear_stale_auto_cleanup_config to both the new internal-table path and the existing data-table path, making optimize non-destructive by construction on pre-v7 upgraded graphs.

  • compact_internal_table (new) — a simplified variant of optimize_one_table with no manifest publish, no recovery sidecar, and a bounded app-level retry loop for Rewrite-vs-live-write conflicts; existence-guarded for the two optional commit-graph tables; followed by a coordinator refresh for cache coherence.
  • clear_stale_auto_cleanup_config (new, shared) — strips any stored lance.auto_cleanup.* config before compacting any table (data or internal) so the commit-time Lance GC hook cannot prune __manifest-pinned versions on upgraded graphs.
  • Test coverage — four new integration tests in maintenance.rs covering internal compaction, absent _graph_commits, auto-cleanup stripping on internal tables, and the same on data tables; the cost LOCK is now green on the authenticated write path.

Confidence Score: 5/5

Safe to merge — compaction is content-preserving, internal tables are read at HEAD so no manifest publish is needed, the bounded retry loop correctly handles concurrent writers, and the existence guard prevents failures on older graphs without commit-graph tables.

All changes are on the maintenance path (optimize), not the live write or read path. Compaction preserves all versions (no GC), the auto-cleanup strip prevents the one destructive upgrade hazard on older graphs, and the new compact_internal_table path is thoroughly tested.

The retry loop in compact_internal_table in optimize.rs is worth a second look: plan_compaction errors exit the loop entirely via ? rather than going through the retryable-conflict check, which is an asymmetry with the surrounding retry logic.

Important Files Changed

Filename Overview
crates/omnigraph/src/db/omnigraph/optimize.rs Core change: adds compact_internal_table, clear_stale_auto_cleanup_config, and is_retryable_lance_conflict; wires all three internal system tables into optimize_all_tables; removes the empty-table-tasks early return. The retry loop correctly handles clear_stale_auto_cleanup_config and compact_files conflicts, but plan_compaction errors exit the retry loop entirely via ?. Multiple coordinator refreshes (one per internal table instead of one at the end) are correct but redundant.
crates/omnigraph/tests/maintenance.rs Adds four new integration tests covering internal-table compaction, absent _graph_commits tolerance, stale auto-cleanup config stripping on internal tables, and the same on data tables; updates optimize_on_empty_graph to assert 7 stats. Comprehensive coverage.
crates/omnigraph/tests/write_cost.rs Un-ignores internal_table_scans_are_flat_in_history LOCK, switches to actorful writes via commit_many_as/measure_insert_as so _graph_commit_actors is exercised, and inserts db.optimize() at each depth checkpoint to prove per-write scan is flat on a periodically-compacted graph.
crates/omnigraph/src/db/commit_graph.rs One-line visibility change: graph_commit_actors_uri promoted from fn to pub(crate) fn so optimize.rs can call it directly.
crates/omnigraph/src/db/manifest.rs Re-exports manifest_uri from layout as pub(crate) so optimize.rs can reference it without going through the internal layout module path.
crates/omnigraph/src/db/manifest/layout.rs One-line visibility change: manifest_uri promoted from pub(super) to pub(crate) to support the re-export in manifest.rs.
crates/omnigraph/tests/helpers/cost.rs Adds measure_insert_as helper for the authenticated write path, mirroring measure_insert but passing an actor so _graph_commit_actors is exercised in IO cost measurements.
crates/omnigraph/tests/helpers/mod.rs Adds commit_many_as helper for building history on the authenticated write path, mirroring commit_many.
docs/dev/rfc-013-write-path-latency.md Updates RFC-013 step 2 to mark 2a landed and 2b deferred; documents the three-table compaction, no-sidecar rationale, bounded retry, and auto-cleanup strip in detail.
docs/dev/invariants.md Updates invariant 15 to note optimize now covers internal tables and that the version chain still grows until step 2b lands.
docs/user/operations/maintenance.md Adds two new bullet points describing internal-table compaction (simplified path, no sidecar, no manifest publish) and the non-destructive guarantee (auto-cleanup strip, bounded retry).
docs/dev/testing.md Updates the write_cost.rs row to reflect the LOCK is now green every-PR and describes the compaction-at-depth-checkpoint pattern.

Comments Outside Diff (1)

  1. crates/omnigraph/src/db/omnigraph/optimize.rs, line 131-132 (link)

    P2 The committed doc-comment says "published to __manifest", but for internal tables committed = true only means Lance advanced its HEAD — there is no manifest publish for these tables. Callers who read committed as "a manifest entry changed" will be confused for the two new table_key values. Consider a short clarifying note here.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

    Fix in Claude Code

Fix All in Claude Code

Reviews (6): Last reviewed commit: "Merge branch 'main' into rfc-013-interna..." | Re-trigger Greptile

ragnorc added 3 commits June 20, 2026 17:28
…n optimize

`optimize` iterated node/edge catalog tables only, so the two internal system
tables (`__manifest`, `_graph_commits`) accumulated one fragment per commit and
were never compacted -- making every write's metadata scan O(fragments), which
grows forever on a long-lived graph (RFC-013 step 2).

`optimize_all_tables` now also compacts both internal tables via a new
`compact_internal_table`. They are not catalog-tracked (readers open them at
their latest Lance HEAD), so it is a much simpler path than `optimize_one_table`:
compact in place, no manifest publish (nothing to publish to), no recovery
sidecar (a single atomic Lance commit -- no HEAD-before-publish gap), and no
optimize_indices (they carry no Lance index, only object_id's unenforced-PK
metadata). No application lock: Lance's compact_files auto-retries its Rewrite
against any concurrent writer (the canonical LanceDB pattern; Rewrite vs Append
is compatible, vs Update a retryable same-fragment conflict Lance rebases), and a
coordinator refresh afterwards makes the warm handle observe the compacted HEAD.

Compacts both tables even though Phase 7 (iss-991) will later fold _graph_commits
into __manifest -- a one-call throwaway for the full interim win; __manifest
compaction is also the prerequisite for Phase 7's graph_head contention. Cleanup
(version GC) of the internal tables is deliberately NOT included here: it needs
the Q8 cleanup-resurrection watermark first (deferred).

maintenance.rs: optimize now returns 6 stats (4 data + 2 internal); adds
optimize_compacts_internal_tables (sheds fragments, leaks no recovery sidecar,
graph coherent for reads + strict writes after).
`internal_table_scans_are_flat_in_history` was the RED, #[ignore]'d acceptance
gate staged in PR #288. With internal-table compaction landed, a write's
__manifest/_graph_commits scan is flat in commit-history depth on a compacted
graph (measured __manifest 4->2, _graph_commits 7->3 across depth 10->100, vs the
pre-step-2 RED 34->214 / 29->207). The test now compacts at each depth before
measuring and runs green every-PR.
- invariants.md: close the compaction half of the read-path-rederivation known
  gap (optimize now compacts the internal tables; cleanup half still deferred).
- maintenance.md: optimize covers __manifest/_graph_commits (no publish, no
  sidecar); not yet in cleanup.
- rfc-013 §9: split step 2 into 2a (compaction, landed) and 2b (cleanup + Q8
  watermark, deferred — debated; MTT-overlap + hot-path liability).
- testing.md: the internal-table LOCK is now green every-PR.
Comment thread crates/omnigraph/src/db/omnigraph/optimize.rs

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8db8937a6a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +578 to +580
let metrics = compact_files(&mut ds, options, None)
.await
.map_err(|e| OmniError::Lance(e.to_string()))?;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve auto-cleanup skips during internal compaction

On upgraded graphs whose internal Lance tables still carry the pre-v7 lance.auto_cleanup.* config, this compact_files commit has no skip_auto_cleanup equivalent, so running optimize can trigger Lance's automatic version GC on __manifest / _graph_commits. Those versions are the snapshot/time-travel authority; docs/dev/lance.md and skip_auto_cleanup_suppresses_version_gc call out that every commit path must skip this hook for pre-bump graphs. Please disable/clear the legacy auto-cleanup config before committing internal compaction, or use a commit path that can set the skip.

Useful? React with 👍 / 👎.

Comment thread crates/omnigraph/src/db/omnigraph/optimize.rs Outdated
…bles

Addresses PR #291 review findings:

- Greptile (P1): optimize unconditionally opened `_graph_commits` for compaction,
  but a graph can validly have none (the coordinator opens it as `Option`, gated on
  `storage.exists`, for graphs predating the commit graph). `Dataset::open` on the
  absent table errored and failed the whole optimize. Guard the `_graph_commits`
  compaction with the same `storage_adapter().exists()` check the coordinator uses;
  `__manifest` always exists so it stays unguarded. Regression test
  `optimize_tolerates_absent_graph_commits_table` (empty graph so no publish
  recreates the table before the guard).

- Cursor (low): the `table_tasks.is_empty()` early return skipped internal-table
  compaction for a schema with no node/edge types. Removed it so the internal
  tables are compacted regardless of the data-table set.

- Codex (auto-cleanup, P1): documented — `compact_files` commits with a default
  `CommitConfig` (no skip_auto_cleanup) and `CompactionOptions` exposes no override,
  so on a graph storing an *on* auto_cleanup config the commit would fire version
  GC. Both internal tables are created with `auto_cleanup: None`, so new graphs are
  safe; the only exposure is pre-fix upgraded graphs, identical to the existing
  data-table optimize path, with step 2b's watermark as the comprehensive guard.
  Added a comment in `compact_internal_table` recording this.
ragnorc added 4 commits June 20, 2026 20:49
…publish)

Step 2 compacts `__manifest` with no app-level lock (Lance OCC arbitrates,
validated against LanceDB + the lance-7.0.0 conflict resolver). compact_files'
`Operation::Rewrite` auto-retries 20x (CommitConfig default num_retries=20), so a
live publish usually wins the race and the compaction rebases. But the publish
runs its merge-insert with conflict_retries(0) = one rebase attempt; if the
compaction commits first AND the merge touched a fragment the Rewrite rewrote,
Lance preempts the publish with `Error::RetryableCommitConflict` — a DIFFERENT
variant from the row-level `TooMuchWriteContention` the publisher already retries.
Left unhandled, that surfaces a transient error to the caller, i.e. a maintenance
compaction (physical op) failing a live write (logical op) — invariant 7.

Map `LanceError::RetryableCommitConflict` to a new
`ManifestConflictDetails::RetryableCommitConflict` and treat it as retryable in the
publisher's outer loop (reload fresh state + re-merge), alongside
RowLevelCasContention. `ExpectedVersionMismatch` still propagates (a genuine
expectation break must not be blindly retried). This also hardens multi-process
concurrent writers generally, not just compaction.

Normal publishes are insert-only (new object_ids -> new fragments, disjoint from
rewritten old ones), so the conflict is rare; the guard covers the
same-fragment-update edge and multi-process writers. Unit tests in publisher.rs
pin the mapping + the retry-predicate contract.
…side)

Reverts d138902. Validated against lance-7.0.0: the publisher's merge-insert runs
with conflict_retries(0), and execute_with_retry converts an exhausted retryable
commit conflict to TooMuchWriteContention before the caller sees it
(write/retry.rs ~95-130). So map_lance_publish_error NEVER receives
RetryableCommitConflict from merge_rows — it receives TooMuchWriteContention, which
the publisher already maps to RowLevelCasContention and retries. The reverted
mapping was therefore dead on the real path and its unit test was synthetic.

The actual exposure is the *compaction* side: compact_files -> commit_compaction ->
apply_commit directly (no execute_with_retry), so a Rewrite-vs-Merge check_txn
conflict propagates raw and optimize can fail on a live graph. That is fixed
app-side in compact_internal_table in the following commit.
Address three findings from review of the step-2 internal-table compaction:

- Non-destructive by construction: before compacting an internal table,
  strip any stored `lance.auto_cleanup.*` config off it. `compact_files`
  commits with a default `CommitConfig` (skip_auto_cleanup=false) and
  `CompactionOptions` exposes no override, so on a graph created by an older
  binary (on-by-default GC hook) the compaction commit would fire Lance's
  auto-cleanup and silently prune `__manifest`-pinned versions. Current
  binaries store no such config; the strip is the upgrade-path safety net so
  `optimize` can never GC versions.

- App-level compaction retry: `compact_files` does NOT auto-retry a semantic
  conflict against a concurrent live writer (Rewrite vs Update/Merge/Delete
  propagates raw from apply_commit; Lance prescribes app-rerun). Wrap the
  internal-table compaction in a bounded retry loop that reopens fresh and
  replans on a retryable Lance conflict, so a maintenance compaction (a
  physical op) never fails a live write (a logical op) — invariant 7.

- Compact all three internal tables, not two: `_graph_commit_actors` grows
  one fragment per commit on the authenticated write path, the same O(depth)
  scan as `__manifest`/`_graph_commits`. Drive the sweep from one
  source-of-truth list with per-table existence guards (the two commit-graph
  tables are optional). Make `graph_commit_actors_uri` pub(crate).

Tests: the `internal_table_scans_are_flat_in_history` LOCK now runs the
authenticated (actorful) write path so it covers `_graph_commit_actors` via
the shared commit-graph IO wrapper (new `commit_many_as`/`measure_insert_as`
helpers); `optimize_clears_stale_auto_cleanup_and_preserves_versions` pins
the non-destructive guarantee (config cleared + no version GC); a unit test
pins the retryable-conflict classifier; the empty-graph stats count is 7
(the actor table is created at init).
… retried

Sync the RFC-013 step-2a section and the maintenance guide with the
correctness-by-design refinements:

- optimize compacts `__manifest`, `_graph_commits`, AND `_graph_commit_actors`
  (the actor table grows on the authenticated write path).
- optimize is non-destructive by construction — it never GCs versions, and
  strips stale `lance.auto_cleanup.*` config so an upgraded graph's commit-time
  GC hook cannot fire during compaction.
- internal-table compaction rebases and retries against concurrent live
  writers rather than failing the operator's optimize or the live write.
- the cost LOCK is the authenticated-path acceptance test.
@ragnorc

ragnorc commented Jun 21, 2026

Copy link
Copy Markdown
Contributor Author

Correctness-by-design pass on internal-table compaction

Addressed four findings from review. All fixes are correct-by-construction (not symptom guards), each with a test at the matching boundary. Full workspace gate green (cargo test --workspace --locked: 65 suites ok, 0 failed).

F1 — optimize must be non-destructive, including on upgraded graphs. compact_files commits with a default CommitConfig (skip_auto_cleanup = false) and CompactionOptions exposes no override, so on a graph created by an older binary (Lance's auto_cleanup hook on by default) the compaction commit would fire version-GC and silently prune __manifest-pinned versions. Fix: clear_stale_auto_cleanup_config strips lance.auto_cleanup.* off the table before compacting — the resulting manifest lacks the keys, so the commit-time hook no-ops. New graphs already store auto_cleanup: None; this is the upgrade-path safety net. Test: optimize_clears_stale_auto_cleanup_and_preserves_versions (config cleared + no version GC; part (a) is non-vacuous since only this function removes config).

F2 — compact all three internal tables. _graph_commit_actors grows one fragment per commit on the authenticated write path (server/CLI always carry an actor), so it had the same O(depth) per-write scan as __manifest/_graph_commits but was excluded. Fix: one source-of-truth internal-table list with per-table existence guards (the two commit-graph tables are optional); graph_commit_actors_uripub(crate). The cost LOCK internal_table_scans_are_flat_in_history now runs the authenticated write path, so it covers the actor table via the shared commit-graph IO wrapper (__manifest 4→2, commit-graph+actors 10→2 across depth 10→100).

F3 — internal-table compaction must retry against live writers. compact_files does not auto-retry a semantic conflict: commit_compactionapply_commit propagates a Rewrite-vs-Update/Merge/Delete check_txn conflict raw (unlike the merge-insert path, it's not wrapped in execute_with_retry). Lance prescribes app-rerun. Fix: bounded retry loop in compact_internal_table that reopens fresh and replans on a retryable Lance conflict — a maintenance compaction (physical op) never fails a live write (logical op), per invariant 7. Test: retryable_lance_conflicts_are_classified.

F4 — reverted the wrong-side hardening (d138902e). That commit added a RetryableCommitConflict mapping to the __manifest publisher, but the publisher's merge-insert path uses conflict_retries(0), which already converts an exhausted retryable conflict to TooMuchWriteContention (already handled) — so the added mapping was dead. The real gap was F3 (the compaction side, which has no retry wrapper at all). Reverted in 4ee7186c.

@ragnorc ragnorc changed the title RFC-013 step 2a: compact the internal __manifest/_graph_commits tables RFC-013 step 2a: compact the internal manifest/commit-graph tables in optimize Jun 21, 2026

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit e034636. Configure here.

Comment thread crates/omnigraph/src/db/omnigraph/optimize.rs
Comment thread crates/omnigraph/src/db/omnigraph/optimize.rs
@ragnorc ragnorc changed the title RFC-013 step 2a: compact the internal manifest/commit-graph tables in optimize (feat): compact the internal manifest/commit-graph tables in optimize Jun 21, 2026
ragnorc added 5 commits June 21, 2026 13:50
…ion work

`compact_internal_table` returns early when `plan_compaction` finds no work,
but `clear_stale_auto_cleanup_config` may have already committed a config-strip
that advanced Lance HEAD. The early return skipped the coordinator refresh that
the successful-compaction path performs, leaving warm `__manifest`/commit-graph
handles pinned to the pre-strip version until the next read's version probe
healed them. No correctness bug (the probe self-heals, and a stale-handle write
would retry via publisher CAS), but the refresh makes coherence deterministic
rather than probe-dependent. Refresh iff the config-strip actually committed.
… not auto-retry

The function doc claimed "Lance's compact_files auto-retries its Operation::Rewrite
against any concurrent writer" — wrong, and contradicting the is_retryable_lance_conflict
doc just below it and the explicit retry loop that exists precisely because compact_files
does NOT auto-retry semantic conflicts (Rewrite vs Update/Merge/Delete propagates raw
through apply_commit). Also move the orphaned description from above the retry-budget
const onto the function, and include the third internal table.
…oo (red)

Regression test for a destructive bug on the data-table optimize path: on an
upgraded graph whose node/edge table still carries pre-v7 lance.auto_cleanup.*
config, `optimize`'s compact_files/optimize_indices commits fire Lance's version
GC and prune __manifest-pinned data-table versions. Mirrors the internal-table
auto_cleanup test on a Person table (force-repair realigns the config-induced
drift so optimize doesn't skip the table). Red against the current code: the
data-table path does not strip the config. The fix lands in the next commit.
… too

The auto_cleanup scrub previously only protected the internal tables; the
data-table path (optimize_one_table) ran compact_files/optimize_indices with a
default CommitConfig (skip_auto_cleanup=false) and no override, so on an upgraded
graph those commits could fire Lance's version-GC hook and prune __manifest-pinned
node/edge versions — making the "non-destructive" contract false for data tables.
Strip the config before the HEAD-advancing commits, capturing version_before first
so the strip's own commit still triggers the Phase-C manifest publish (no uncovered
drift). No retry loop needed: the data-table path holds the per-table write queue.
Covered by the existing Optimize recovery sidecar. Turns the prior commit's test green.

Also: switch clear_stale_auto_cleanup_config off the deprecated delete_config_keys
to update_config(None values), and correct two now-inaccurate doc comments —
compaction is "one or more content-preserving commits" (compact_files can emit a
ReserveFragments before the Rewrite), not "a single atomic commit"; the sidecar-free
property rests on content-preservation + read-at-HEAD, not single-commit atomicity.
…try claims

- non-destructive guarantee now spans data + internal tables (the auto_cleanup
  strip runs on both paths), not just the internal ones.
- "single atomic Lance commit" was inaccurate: compaction can emit a
  ReserveFragments commit before the Rewrite; the no-sidecar property rests on
  content-preservation + read-at-HEAD, not single-commit atomicity.
- "retries rather than failing" softened to the truth: a *bounded* retry on the
  internal path; sustained contention surfaces a loud conflict error (bounded +
  observable, not an infinite loop). The data path holds the per-table queue and
  never contends.
@ragnorc

ragnorc commented Jun 21, 2026

Copy link
Copy Markdown
Contributor Author

Second review pass — two more findings, both valid, both addressed

Validated against the current code and Lance 7.0.0 source. Full cargo test --workspace --locked green (65 suites, 0 failed).

Finding 1 (High) — optimize was destructive on upgraded DATA tables. Valid; fixed.
Confirmed: the data-table path (optimize_one_table) ran compact_files/optimize_indices with no auto-cleanup scrub, while Lance's commit_compactionapply_commit(.., &Default::default(), &Default::default()) (lance dataset/optimize.rs:1627) commits with skip_auto_cleanup=false and CompactionOptions exposes no override. So on a graph created by a pre-v7 binary (auto_cleanup ON), optimize could fire Lance's version-GC and prune __manifest-pinned node/edge versions — falsifying the "non-destructive" contract. This was a pre-existing gap on a path that predates this PR, but the PR strengthened that contract, so it's fixed here rather than left to lie.

Fix (864d296e): reuse clear_stale_auto_cleanup_config on the data path too, inside the existing Optimize sidecar window. version_before is captured before the strip so the strip's own commit still triggers the Phase-C manifest publish (no uncovered HEAD>manifest drift). No retry loop needed — the data path holds the per-table write queue, so it doesn't contend. Regression test optimize_clears_stale_auto_cleanup_on_data_tables_too landed as a red commit first (b2897c62), green after the fix.

Finding 2 (Medium) — docs overclaimed substrate guarantees. Valid; corrected.

  • "single atomic Lance commit" was wrong: compact_files emits a ReserveFragments commit before the Rewrite (lance dataset/optimize.rs:2364 test: "1 commit for reserve fragments and 1 for final commit"). Corrected to "one or more content-preserving commits"; the no-sidecar property is re-grounded on content-preservation + read-at-HEAD, not single-commit atomicity.
  • "retries rather than failing" softened to the truth: a bounded retry on the internal path; sustained contention past the budget surfaces a loud conflict error (bounded + observable per invariant 13, not an infinite loop). The data path holds the per-table queue and never contends.

Incidental cleanup: clear_stale_auto_cleanup_config was on the deprecated delete_config_keys; switched to update_config(None values).

Commits: b2897c62 (test, red) → 864d296e (fix, green) → 99d8358b (docs).

ragnorc added a commit that referenced this pull request Jun 21, 2026
Fold in two measured findings from the deployed edge binary (f6d2cc0) on
rustfs behind a latency+concurrency proxy:

- §0(d): concurrency-cap A/B. Under unlimited concurrency the internal-table
  scan parallelizes (backbone ~110); under an R2-realistic cap (8) it serializes
  and an UNCOMPACTED graph runs away (per-write ops 1273->3505, wall 6->16s),
  while #291's internal compaction cuts it ~6x and bounds it (137->1 frag). The
  latency model is (serial_hops + ops/effective_concurrency)*RTT + compute.
- Reframe step 2 across Summary/§2.4/§9: NOT de-ranked — on R2 (capped) it is a
  primary latency lever + the anti-runaway fix + Phase-7 prereq. The earlier
  'step 2 is parallel, irrelevant to latency' was an unlimited-concurrency
  artifact. Deployed f6d2cc0 optimize is node/edge-only; #291 (undeployed) is
  the prod win.
- §5.1: the cost-gate ThrottledStore must cap concurrency AND inject latency;
  assert serial_hops flat AND ops flat in history.
- §2.3 + §8: Lance/LanceDB comparison from 7.0.0 source — Lance metadata is a
  single-file per-version manifest read O(1) (latest_version_hint), pruned by
  default; omnigraph's __manifest-as-Lance-dataset scan is self-inflicted by the
  cross-table-atomicity choice. Adds explicit defense of Lance-dataset __manifest
  (MTT seam) vs a flat-file CAS'd manifest (cheaper, off the MTT path).

Design (§3/§4.1) unchanged and vindicated; corrections are measurement framing,
step sizing, and one design-choice that was implicit.
@ragnorc ragnorc merged commit f2b792e into main Jun 21, 2026
8 checks passed
ragnorc added a commit that referenced this pull request Jun 21, 2026
…rency-cap correction + Lance-metadata comparison (#292)

* feat(engine): compact the internal __manifest/_graph_commits tables in optimize

`optimize` iterated node/edge catalog tables only, so the two internal system
tables (`__manifest`, `_graph_commits`) accumulated one fragment per commit and
were never compacted -- making every write's metadata scan O(fragments), which
grows forever on a long-lived graph (RFC-013 step 2).

`optimize_all_tables` now also compacts both internal tables via a new
`compact_internal_table`. They are not catalog-tracked (readers open them at
their latest Lance HEAD), so it is a much simpler path than `optimize_one_table`:
compact in place, no manifest publish (nothing to publish to), no recovery
sidecar (a single atomic Lance commit -- no HEAD-before-publish gap), and no
optimize_indices (they carry no Lance index, only object_id's unenforced-PK
metadata). No application lock: Lance's compact_files auto-retries its Rewrite
against any concurrent writer (the canonical LanceDB pattern; Rewrite vs Append
is compatible, vs Update a retryable same-fragment conflict Lance rebases), and a
coordinator refresh afterwards makes the warm handle observe the compacted HEAD.

Compacts both tables even though Phase 7 (iss-991) will later fold _graph_commits
into __manifest -- a one-call throwaway for the full interim win; __manifest
compaction is also the prerequisite for Phase 7's graph_head contention. Cleanup
(version GC) of the internal tables is deliberately NOT included here: it needs
the Q8 cleanup-resurrection watermark first (deferred).

maintenance.rs: optimize now returns 6 stats (4 data + 2 internal); adds
optimize_compacts_internal_tables (sheds fragments, leaks no recovery sidecar,
graph coherent for reads + strict writes after).

* test(engine): un-ignore the internal-table scan LOCK (step 2 acceptance)

`internal_table_scans_are_flat_in_history` was the RED, #[ignore]'d acceptance
gate staged in PR #288. With internal-table compaction landed, a write's
__manifest/_graph_commits scan is flat in commit-history depth on a compacted
graph (measured __manifest 4->2, _graph_commits 7->3 across depth 10->100, vs the
pre-step-2 RED 34->214 / 29->207). The test now compacts at each depth before
measuring and runs green every-PR.

* docs: RFC-013 step 2 internal-table compaction landed

- invariants.md: close the compaction half of the read-path-rederivation known
  gap (optimize now compacts the internal tables; cleanup half still deferred).
- maintenance.md: optimize covers __manifest/_graph_commits (no publish, no
  sidecar); not yet in cleanup.
- rfc-013 §9: split step 2 into 2a (compaction, landed) and 2b (cleanup + Q8
  watermark, deferred — debated; MTT-overlap + hot-path liability).
- testing.md: the internal-table LOCK is now green every-PR.

* fix(engine): guard absent _graph_commits + always compact internal tables

Addresses PR #291 review findings:

- Greptile (P1): optimize unconditionally opened `_graph_commits` for compaction,
  but a graph can validly have none (the coordinator opens it as `Option`, gated on
  `storage.exists`, for graphs predating the commit graph). `Dataset::open` on the
  absent table errored and failed the whole optimize. Guard the `_graph_commits`
  compaction with the same `storage_adapter().exists()` check the coordinator uses;
  `__manifest` always exists so it stays unguarded. Regression test
  `optimize_tolerates_absent_graph_commits_table` (empty graph so no publish
  recreates the table before the guard).

- Cursor (low): the `table_tasks.is_empty()` early return skipped internal-table
  compaction for a schema with no node/edge types. Removed it so the internal
  tables are compacted regardless of the data-table set.

- Codex (auto-cleanup, P1): documented — `compact_files` commits with a default
  `CommitConfig` (no skip_auto_cleanup) and `CompactionOptions` exposes no override,
  so on a graph storing an *on* auto_cleanup config the commit would fire version
  GC. Both internal tables are created with `auto_cleanup: None`, so new graphs are
  safe; the only exposure is pre-fix upgraded graphs, identical to the existing
  data-table optimize path, with step 2b's watermark as the comprehensive guard.
  Added a comment in `compact_internal_table` recording this.

* docs(rfc-013): serial-hop correction — wall-clock is the ~110-hop backbone, not op count

Latency-slope measurement on the deployed edge binary (f6d2cc0, steps 1+3a
landed; rustfs + per-op latency proxy, depth 1..85) shows wall-clock is set by a
~110-hop SERIAL backbone that is depth-invariant. Total ops grow +~7/depth but
PARALLELIZE (parallelism 1->6), so the depth term adds little wall-clock.

- New §0(c): the serial-hop vs total-op finding + branch-op backbones
  (create ~77, delete ~87, branch-write ~258/1777-ops/21s floor = fork-on-first-write).
- §2.4: correct the '1720->198 ops => 258s->30s' op-count->wall-clock conversion.
- §5.1: promote serial-hop/num_stages to the PRIMARY latency LOCK; op-count
  flatness demoted to a cost/compute-floor gate.
- §9 step 2: reprioritized as Phase-7 prerequisite + compute-floor/space, NOT
  the wall-clock fix; step 3b (parallel capture-once WriteTxn) is the headline
  latency lever; branch-write moved under step 3b + fork seam.
- Summary: serial-backbone correction up front.

Vindicates the §3/§4.1 design; corrects the op-count latency framing.

* docs(rfc-013): concurrency-cap correction + Lance-metadata comparison

Fold in two measured findings from the deployed edge binary (f6d2cc0) on
rustfs behind a latency+concurrency proxy:

- §0(d): concurrency-cap A/B. Under unlimited concurrency the internal-table
  scan parallelizes (backbone ~110); under an R2-realistic cap (8) it serializes
  and an UNCOMPACTED graph runs away (per-write ops 1273->3505, wall 6->16s),
  while #291's internal compaction cuts it ~6x and bounds it (137->1 frag). The
  latency model is (serial_hops + ops/effective_concurrency)*RTT + compute.
- Reframe step 2 across Summary/§2.4/§9: NOT de-ranked — on R2 (capped) it is a
  primary latency lever + the anti-runaway fix + Phase-7 prereq. The earlier
  'step 2 is parallel, irrelevant to latency' was an unlimited-concurrency
  artifact. Deployed f6d2cc0 optimize is node/edge-only; #291 (undeployed) is
  the prod win.
- §5.1: the cost-gate ThrottledStore must cap concurrency AND inject latency;
  assert serial_hops flat AND ops flat in history.
- §2.3 + §8: Lance/LanceDB comparison from 7.0.0 source — Lance metadata is a
  single-file per-version manifest read O(1) (latest_version_hint), pruned by
  default; omnigraph's __manifest-as-Lance-dataset scan is self-inflicted by the
  cross-table-atomicity choice. Adds explicit defense of Lance-dataset __manifest
  (MTT seam) vs a flat-file CAS'd manifest (cheaper, off the MTT path).

Design (§3/§4.1) unchanged and vindicated; corrections are measurement framing,
step sizing, and one design-choice that was implicit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant