docs(rfc-013): latency = (serial_hops + ops/concurrency)·RTT — concurrency-cap correction + Lance-metadata comparison by ragnorc · Pull Request #292 · ModernRelay/omnigraph

ragnorc · 2026-06-20T17:19:51Z

Stacked on #291 (the branch that holds the RFC). Doc-only correction to RFC-013's latency measurement framing, from a latency+concurrency study of the deployed edge binary (f6d2cc03, steps 1+3a landed) on rustfs.

The corrected latency model

wall ≈ (serial_hops + ops / effective_concurrency) · RTT + compute

Two measured findings:

§0(c) — serial backbone. Under unlimited concurrency, wall-clock is a ~110-hop serial backbone, depth-invariant; the depth-driven ops (+~7/depth, 107→716) parallelize away (max_inflight hit 129). → step 3b (parallel capture-once WriteTxn) is the lever.

§0(d) — concurrency-cap A/B (the key result). Under an R2-realistic cap (8), the internal-table fragment scan can't fan out, so op count re-enters wall-clock. With fragment count as the only variable:

state	per-write ops	wall (cap=8, L=20)
uncompacted (`__manifest` 137 frags)	1273 → 1487 → 3505	5.9 → 8.4 → 16.4s (runaway)
after (feat): compact the internal manifest/commit-graph tables in optimize #291 `optimize` (137→1)	275 → 250 → 197	6.2 → 5.4 → 3.8s (bounded)

This is the prod 35s mechanism and its degradation over time (the O(N²)): every uncompacted write scans all __manifest/_graph_commits fragments and adds one.

What this corrects (vs. my first draft of this PR)

My initial commit de-ranked step 2 as "parallel, irrelevant to latency" — that was an unlimited-concurrency artifact. On R2 (capped), step 2a is a primary latency lever and the anti-runaway fix. Both step 3b (serial backbone) and step 2a (op count) are load-bearing; which dominates is set by effective_concurrency × fragment_count.

Prod note: the deployed f6d2cc03 optimize is node/edge-only, so an operator optimize on prod cannot compact the internal tables — #291 (undeployed) is the immediate prod win.

Changes (doc only)

New §0(c)+(d) — serial-hop + concurrency-cap A/B; branch-op backbones.
§2.4 / §9 step 2 / Summary — reframe step 2 as concurrency-dependent (primary latency lever on R2), not de-ranked.
§5.1 — the cost-gate ThrottledStore must cap concurrency AND inject latency; assert serial-hops flat AND op-count flat in history.
§2.3 + §8 (new) — Lance/LanceDB comparison from 7.0.0 source: Lance metadata is a single-file per-version manifest read O(1) (latest_version_hint), pruned by default; omnigraph's __manifest-as-Lance-dataset scan is self-inflicted by the cross-table-atomicity choice. Adds an explicit defense of the Lance-dataset __manifest (MTT seam) vs a flat-file CAS'd manifest (cheaper, off the MTT path).

The design (§3/§4.1) is unchanged and vindicated — every finding maps onto an existing pillar (step 3b, step 2a, graph_head one-CAS, MTT exit). Only the measurement framing, step sizing, and one previously-implicit design choice are corrected.

Note

Low Risk
Single markdown file only; no code, APIs, or CI behavior changes.

Overview
Doc-only update to RFC-013 — reframes write-path latency measurement after rustfs/proxy studies on the deployed edge binary; no runtime or test changes.

The RFC now states wall-clock as (serial_hops + ops / effective_concurrency) · RTT + compute, not total ops × RTT. New §0(c) adds serial-backbone tables (~110 hops, depth-invariant under unlimited concurrency); §0(d) adds an R2-realistic concurrency cap=8 A/B showing uncompacted internal-table scans run away while #291 compaction bounds latency (~6×).

§2.4, §9 step 2, and the summary restore step 2a as a primary latency lever on capped stores (not “parallel-only”). §5.1 requires a ThrottledStore with both latency injection and concurrency cap, plus two LOCKs (serial hops flat; op count flat in history under cap). §2.3 and §8 add Lance O(1) metadata vs omnigraph catalog-as-dataset, and defend keeping __manifest as a Lance dataset for the MTT seam vs a flat CAS file. §9 step 6 moves branch-write cost to step 3b / fork seam.

^{Reviewed by Cursor Bugbot for commit 8c27303. Bugbot is set up for automated code reviews on this repo. Configure here.}

Greptile Summary

This PR corrects RFC-013's latency framing from a simple op-count model to (serial_hops + ops/effective_concurrency)·RTT + compute, backed by two new measurements on the deployed edge binary (f6d2cc03) behind a latency+concurrency proxy.

§0(c)+(d): Adds a serial-hop sweep (direct-main backbone ~110 hops, depth-invariant) and a concurrency-cap A/B (cap=8 mimicking R2) proving that uncompacted internal-table fragment scans run away under a cap (6→16 s) while (feat): compact the internal manifest/commit-graph tables in optimize #291's compaction bounds them (~3.8 s); reprioritizes step 2a from "parallel afterthought" to primary R2 latency lever.
§2.3/§8: New Lance 7.0.0 source comparison explaining why omnigraph's __manifest O(fragments) scan is self-inflicted (cross-table-atomicity choice) and defending the Lance-dataset catalog over a flat-file CAS manifest due to the MTT seam (lance#7264).
§5.1: ThrottledStore now required to cap concurrency and inject latency, with two explicit LOCKs: serial-hop flat in depth and op-count flat in history under a capped store.

Confidence Score: 5/5

Documentation-only change with no runtime or test code affected; safe to merge.

All changes are confined to a single RFC markdown file. The new measurement sections (§0(c)/(d)), Lance comparison (§2.3/§8), and §5.1 gate corrections document existing measured behavior and design rationale without touching any source code. Two internal data-consistency questions in the §0(d) table are worth clarifying before this becomes the authoritative measurement record, but they do not affect any deployed code.

docs/dev/rfc-013-write-path-latency.md — §0(d) measurement table (lines 200–203) and the max_inflight cross-reference between §0(c) and §0(d) (lines 175 vs 194).

Important Files Changed

Filename	Overview
docs/dev/rfc-013-write-path-latency.md	Doc-only reframe of RFC-013's latency model from op-count-serial to (serial_hops + ops/concurrency)·RTT; adds §0(c)/(d) measurements, §2.3/§8 Lance comparison, §5.1 two-LOCK gate, and §9 step-2 reprioritization. Two internal measurement inconsistencies: the §0(d) "after optimize" op sequence is decreasing (275→250→197) where the per-fragment-append model predicts growth, and max_inflight is cited as 65 in §0(c) but 129 for the same measurement in §0(d).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Write request"] --> B{Store concurrency?}
    B -->|"Unlimited (rustfs)"| C["Parallel fan-out\nmax_inflight up to 65–129"]
    B -->|"Capped at 8 (R2-realistic)"| D{"Fragment count?"}
    C --> E["Wall ≈ serial_backbone·RTT\n(~110 hops, depth-invariant)"]
    D -->|"Uncompacted\n(__manifest 137 frags)"| F["Ops 1273→3505\nwall 5.9→16.4s RUNAWAY"]
    D -->|"After #291 optimize\n(137→1 frag)"| G["Ops 275→197\nwall 6.2→3.8s BOUNDED"]
    E --> H["Lever: step 3b\ncapture-once WriteTxn"]
    F --> I["Lever: step 2a\ninternal-table compaction (#291)"]
    G --> J["Residual: step 3b\nserial backbone"]
    H --> K["§5.1 LOCKs:\nserial_hops flat in depth\nops flat in history under MAXCONC=8"]
    I --> K
    J --> K

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A["Write request"] --> B{Store concurrency?}
    B -->|"Unlimited (rustfs)"| C["Parallel fan-out\nmax_inflight up to 65–129"]
    B -->|"Capped at 8 (R2-realistic)"| D{"Fragment count?"}
    C --> E["Wall ≈ serial_backbone·RTT\n(~110 hops, depth-invariant)"]
    D -->|"Uncompacted\n(__manifest 137 frags)"| F["Ops 1273→3505\nwall 5.9→16.4s RUNAWAY"]
    D -->|"After #291 optimize\n(137→1 frag)"| G["Ops 275→197\nwall 6.2→3.8s BOUNDED"]
    E --> H["Lever: step 3b\ncapture-once WriteTxn"]
    F --> I["Lever: step 2a\ninternal-table compaction (#291)"]
    G --> J["Residual: step 3b\nserial backbone"]
    H --> K["§5.1 LOCKs:\nserial_hops flat in depth\nops flat in history under MAXCONC=8"]
    I --> K
    J --> K

_{Reviews (3): Last reviewed commit: "Merge remote-tracking branch 'origin/mai..." | Re-trigger Greptile}

…n optimize `optimize` iterated node/edge catalog tables only, so the two internal system tables (`__manifest`, `_graph_commits`) accumulated one fragment per commit and were never compacted -- making every write's metadata scan O(fragments), which grows forever on a long-lived graph (RFC-013 step 2). `optimize_all_tables` now also compacts both internal tables via a new `compact_internal_table`. They are not catalog-tracked (readers open them at their latest Lance HEAD), so it is a much simpler path than `optimize_one_table`: compact in place, no manifest publish (nothing to publish to), no recovery sidecar (a single atomic Lance commit -- no HEAD-before-publish gap), and no optimize_indices (they carry no Lance index, only object_id's unenforced-PK metadata). No application lock: Lance's compact_files auto-retries its Rewrite against any concurrent writer (the canonical LanceDB pattern; Rewrite vs Append is compatible, vs Update a retryable same-fragment conflict Lance rebases), and a coordinator refresh afterwards makes the warm handle observe the compacted HEAD. Compacts both tables even though Phase 7 (iss-991) will later fold _graph_commits into __manifest -- a one-call throwaway for the full interim win; __manifest compaction is also the prerequisite for Phase 7's graph_head contention. Cleanup (version GC) of the internal tables is deliberately NOT included here: it needs the Q8 cleanup-resurrection watermark first (deferred). maintenance.rs: optimize now returns 6 stats (4 data + 2 internal); adds optimize_compacts_internal_tables (sheds fragments, leaks no recovery sidecar, graph coherent for reads + strict writes after).

`internal_table_scans_are_flat_in_history` was the RED, #[ignore]'d acceptance gate staged in PR #288. With internal-table compaction landed, a write's __manifest/_graph_commits scan is flat in commit-history depth on a compacted graph (measured __manifest 4->2, _graph_commits 7->3 across depth 10->100, vs the pre-step-2 RED 34->214 / 29->207). The test now compacts at each depth before measuring and runs green every-PR.

- invariants.md: close the compaction half of the read-path-rederivation known gap (optimize now compacts the internal tables; cleanup half still deferred). - maintenance.md: optimize covers __manifest/_graph_commits (no publish, no sidecar); not yet in cleanup. - rfc-013 §9: split step 2 into 2a (compaction, landed) and 2b (cleanup + Q8 watermark, deferred — debated; MTT-overlap + hot-path liability). - testing.md: the internal-table LOCK is now green every-PR.

…bles Addresses PR #291 review findings: - Greptile (P1): optimize unconditionally opened `_graph_commits` for compaction, but a graph can validly have none (the coordinator opens it as `Option`, gated on `storage.exists`, for graphs predating the commit graph). `Dataset::open` on the absent table errored and failed the whole optimize. Guard the `_graph_commits` compaction with the same `storage_adapter().exists()` check the coordinator uses; `__manifest` always exists so it stays unguarded. Regression test `optimize_tolerates_absent_graph_commits_table` (empty graph so no publish recreates the table before the guard). - Cursor (low): the `table_tasks.is_empty()` early return skipped internal-table compaction for a schema with no node/edge types. Removed it so the internal tables are compacted regardless of the data-table set. - Codex (auto-cleanup, P1): documented — `compact_files` commits with a default `CommitConfig` (no skip_auto_cleanup) and `CompactionOptions` exposes no override, so on a graph storing an *on* auto_cleanup config the commit would fire version GC. Both internal tables are created with `auto_cleanup: None`, so new graphs are safe; the only exposure is pre-fix upgraded graphs, identical to the existing data-table optimize path, with step 2b's watermark as the comprehensive guard. Added a comment in `compact_internal_table` recording this.

…kbone, not op count Latency-slope measurement on the deployed edge binary (f6d2cc0, steps 1+3a landed; rustfs + per-op latency proxy, depth 1..85) shows wall-clock is set by a ~110-hop SERIAL backbone that is depth-invariant. Total ops grow +~7/depth but PARALLELIZE (parallelism 1->6), so the depth term adds little wall-clock. - New §0(c): the serial-hop vs total-op finding + branch-op backbones (create ~77, delete ~87, branch-write ~258/1777-ops/21s floor = fork-on-first-write). - §2.4: correct the '1720->198 ops => 258s->30s' op-count->wall-clock conversion. - §5.1: promote serial-hop/num_stages to the PRIMARY latency LOCK; op-count flatness demoted to a cost/compute-floor gate. - §9 step 2: reprioritized as Phase-7 prerequisite + compute-floor/space, NOT the wall-clock fix; step 3b (parallel capture-once WriteTxn) is the headline latency lever; branch-write moved under step 3b + fork seam. - Summary: serial-backbone correction up front. Vindicates the §3/§4.1 design; corrects the op-count latency framing.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3a560297a1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-20T17:22:59Z

+assertion (`serial_hops ≤ K`, flat in depth) via a `ThrottledStore` wrapper that
+injects per-op latency and reads the wall = `compute + serial_hops·L` slope (Lance's
+`test_commit_iops` setup) — this is what catches the ~110-hop backbone §0(c) measured.


Keep the Tier-1 serial-hop gate deterministic

If this RFC is implemented literally, the every-PR Tier-1 gate would infer serial_hops by injecting latency and fitting a wall-clock slope, which conflicts with the same section's requirement that Tier 1 be deterministic/no-wall-clock and with §5.2 reserving wall-clock trend checks for nightly. On shared CI runners this would either make the PR gate slow/flaky or force implementers to choose arbitrary sleep values; the PR gate should count num_stages/critical-path events directly and leave latency-injection slope measurements to the Tier-2 benchmark.

Useful? React with 👍 / 👎.

Fold in two measured findings from the deployed edge binary (f6d2cc0) on rustfs behind a latency+concurrency proxy: - §0(d): concurrency-cap A/B. Under unlimited concurrency the internal-table scan parallelizes (backbone ~110); under an R2-realistic cap (8) it serializes and an UNCOMPACTED graph runs away (per-write ops 1273->3505, wall 6->16s), while #291's internal compaction cuts it ~6x and bounds it (137->1 frag). The latency model is (serial_hops + ops/effective_concurrency)*RTT + compute. - Reframe step 2 across Summary/§2.4/§9: NOT de-ranked — on R2 (capped) it is a primary latency lever + the anti-runaway fix + Phase-7 prereq. The earlier 'step 2 is parallel, irrelevant to latency' was an unlimited-concurrency artifact. Deployed f6d2cc0 optimize is node/edge-only; #291 (undeployed) is the prod win. - §5.1: the cost-gate ThrottledStore must cap concurrency AND inject latency; assert serial_hops flat AND ops flat in history. - §2.3 + §8: Lance/LanceDB comparison from 7.0.0 source — Lance metadata is a single-file per-version manifest read O(1) (latest_version_hint), pruned by default; omnigraph's __manifest-as-Lance-dataset scan is self-inflicted by the cross-table-atomicity choice. Adds explicit defense of Lance-dataset __manifest (MTT seam) vs a flat-file CAS'd manifest (cheaper, off the MTT path). Design (§3/§4.1) unchanged and vindicated; corrections are measurement framing, step sizing, and one design-choice that was implicit.

…rrection # Conflicts: # crates/omnigraph/src/db/omnigraph/optimize.rs # crates/omnigraph/tests/maintenance.rs # crates/omnigraph/tests/write_cost.rs # docs/dev/rfc-013-write-path-latency.md # docs/user/operations/maintenance.md

ragnorc added 5 commits June 20, 2026 17:28

chatgpt-codex-connector Bot reviewed Jun 20, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 20, 2026

View reviewed changes

Comment thread docs/dev/rfc-013-write-path-latency.md Outdated

Comment thread docs/dev/rfc-013-write-path-latency.md Outdated

ragnorc changed the title ~~docs(rfc-013): serial-hop correction — wall-clock is the ~110-hop backbone, not op count~~ docs(rfc-013): latency = (serial_hops + ops/concurrency)·RTT — concurrency-cap correction + Lance-metadata comparison Jun 21, 2026

ragnorc changed the base branch from rfc-013-internal-table-compaction to main June 21, 2026 19:48

ragnorc merged commit 5cfae9a into main Jun 21, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(rfc-013): latency = (serial_hops + ops/concurrency)·RTT — concurrency-cap correction + Lance-metadata comparison#292

docs(rfc-013): latency = (serial_hops + ops/concurrency)·RTT — concurrency-cap correction + Lance-metadata comparison#292
ragnorc merged 7 commits into
mainfrom
rfc-013-serial-hop-correction

ragnorc commented Jun 20, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 20, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ragnorc commented Jun 20, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The corrected latency model

What this corrects (vs. my first draft of this PR)

Changes (doc only)

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ragnorc commented Jun 20, 2026 •

edited by greptile-apps Bot

Loading