Skip to content

docs(rfc): RFC-013 step 3b — capture-once WriteTxn (unify the write opener)#295

Open
aaltshuler wants to merge 1 commit into
mainfrom
docs/rfc-013-step-3b
Open

docs(rfc): RFC-013 step 3b — capture-once WriteTxn (unify the write opener)#295
aaltshuler wants to merge 1 commit into
mainfrom
docs/rfc-013-step-3b

Conversation

@aaltshuler

@aaltshuler aaltshuler commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator

Summary

Implementation-grade spec for RFC-013 step 3b — the opener-unification half of the WriteTxn (parent RFC §4.1/§4.2/§9-3b). Docs-only (Proposed).

Step 3a (PR #288) made the write open O(1) but left writes on a bespoke bare-URL-latest opener that diverges from the read path (pinned version + shared Session + handle cache). 3b collapses the two openers into one primitive — so the read/write open-equivalence question (raised in the #288 post-merge review) is answered by construction: there's only one opener.

Key design decisions

  • Open at the pinned base version, not HEAD (§2.3) — the consistent snapshot-isolation read base; the publish CAS is the staleness guard. Strict-op conflict detection moves from open-time fail-fast to commit-time CAS (identical ExpectedVersionMismatch error, more correct read base for Update/Delete — avoids the lost-update §4.2 warns about). Flagged explicitly for review ratification.
  • Validate-once (removes the flat-46 schema-read constant); intra-txn handle reuse (write-local, commit-invalidated — no stale cross-write cache, invariant 5 preserved).
  • Scope fenced: PublishPlan/Phase-7-lineage/epoch/internal-compaction are steps 4/5/2, explicitly out. 3b is a refactor under the existing publish machinery; no module retired (smaller blast radius than 3a).

Closes the two audit gaps

The test plan (§5) bakes in the two holes a step-3a edge-case audit surfaced:

  1. A differential equivalence guard — open via the #[cfg(test)] namespace path vs WriteTxn::open, assert identical version/branch/schema (converts "the suite passes" into a pinned contract).
  2. A branch-checkout-at-advanced-head strict-write test.

Plus an invariants check (2/3/5/15 hold or strengthen) and reversibility notes.

Standalone and shippable — unblocked by steps 2/4/5.

🤖 Generated with Claude Code

Greptile Summary

This docs-only RFC specifies the implementation plan for RFC-013 step 3b: collapsing the bespoke write-path opener onto the same pinned-version + shared-Session primitive the read path already uses, yielding WriteTxn — a capture-once struct that threads a pinned PinnedSnapshot, shared Session, and WriteHandleCache through all write stages.

  • WriteTxn design (§2): opens each touched table once by pinned version (from_uri(location).with_version(base.version).with_session(session)), eliminates per-stage re-opens and the flat-46 schema-read constant, and makes re-resolution of "latest" unrepresentable by type.
  • Behavioral change (§2.3): strict-op conflict detection moves from open-time fail-fast to commit-time CAS (ExpectedVersionMismatch). Explicitly flagged for review ratification; §7 lists it as a risk with a reversibility path.
  • Migration (§3): routes the four write entry points through WriteTxn::open at the existing snapshot capture seam; open_dataset_head_for_write can be retired; non-hot-path direct openers stay unchanged.

Confidence Score: 4/5

Docs-only change — no code ships yet. The index update is a clean one-liner. The RFC is well-structured and largely sound, with one test-plan design issue that would result in false assurance if implemented as written.

The RFC's design rationale, invariant analysis, and migration path are thorough and consistent with the parent RFC-013. The main concern is §5 test #1: the differential equivalence guard compares against StagedTableNamespace (old open-at-HEAD behavior), which diverges from WriteTxn::open whenever HEAD > base — the exact case 3b handles — providing false assurance rather than closing audit gap 1. Two additional omissions (WriteHandleCache threading and CAS-conflict retry contract) affect implementation fidelity but don't block the spec at Proposed status.

docs/dev/rfc-013-step-3b-writetxn.md — specifically §5 test #1, the WriteHandleCache API surface in §2.1, and the retry guidance in §2.3

Important Files Changed

Filename Overview
docs/dev/rfc-013-step-3b-writetxn.md New 278-line implementation spec for WriteTxn. Three design gaps: test §5/#1 is mis-specified (differential comparator diverges by design when HEAD > base, so won't close audit gap 1); WriteHandleCache interior mutability requirements are unspecified; the retry contract for strict-op CAS conflicts is absent.
docs/dev/index.md Single row added to the Active Implementation Plans table linking rfc-013-step-3b-writetxn.md. Entry correctly describes the scope and parent ticket. No issues.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant WE as Write Entry
    participant WT as WriteTxn
    participant WHC as WriteHandleCache
    participant LS as Lance Dataset
    participant PUB as ManifestBatchPublisher

    WE->>WT: WriteTxn::open(branch, snapshot)
    Note over WT: base = PinnedSnapshot, schema validated once
    loop each touched table
        WT->>WHC: get_or_open(table_key)
        WHC->>LS: from_uri(loc).with_version(N).with_session(s)
        LS-->>WHC: Dataset handle O(1)
        WHC-->>WT: cached handle
    end
    WE->>WT: stage via WriteTxn handles
    WT-->>WE: staged
    WE->>PUB: publish CAS
    alt CAS wins
        PUB-->>WE: PublishedSnapshot
    else CAS loses
        PUB-->>WE: ExpectedVersionMismatch
        Note over WE: Must open fresh WriteTxn to retry
    end
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant WE as Write Entry
    participant WT as WriteTxn
    participant WHC as WriteHandleCache
    participant LS as Lance Dataset
    participant PUB as ManifestBatchPublisher

    WE->>WT: WriteTxn::open(branch, snapshot)
    Note over WT: base = PinnedSnapshot, schema validated once
    loop each touched table
        WT->>WHC: get_or_open(table_key)
        WHC->>LS: from_uri(loc).with_version(N).with_session(s)
        LS-->>WHC: Dataset handle O(1)
        WHC-->>WT: cached handle
    end
    WE->>WT: stage via WriteTxn handles
    WT-->>WE: staged
    WE->>PUB: publish CAS
    alt CAS wins
        PUB-->>WE: PublishedSnapshot
    else CAS loses
        PUB-->>WE: ExpectedVersionMismatch
        Note over WE: Must open fresh WriteTxn to retry
    end
Loading

Fix All in Claude Code

Reviews (1): Last reviewed commit: "docs(rfc): RFC-013 step 3b — capture-onc..." | Re-trigger Greptile

Greptile also left 3 inline comments on this PR.

…pener)

Implementation-grade spec for RFC-013 step 3b, the opener-unification half of
the WriteTxn (parent §4.1/§4.2/§9-3b). Step 3a (PR #288) made the write open
O(1) but left writes on a bespoke bare-URL-latest opener that diverges from the
read path (pinned version + shared Session + handle cache). 3b collapses the two
into one primitive, so the read/write open-equivalence question is answered by
construction.

Key design points captured:
- Open at the pinned base version (consistent SI read base), with the publish
  CAS as the staleness guard — strict-op conflict detection moves from open-time
  fail-fast to commit-time CAS (same ExpectedVersionMismatch error; more correct
  read base for Update/Delete). Flagged for review ratification.
- Validate-once (removes the flat-46 schema-read constant); intra-txn handle
  reuse (write-local, commit-invalidated — no stale cross-write cache).
- Scope fenced: PublishPlan/Phase-7/epoch/internal-compaction are steps 4/5/2,
  explicitly out.
- Test plan closes the two gaps the step-3a edge-case audit found: a differential
  equivalence guard (test-only namespace vs WriteTxn::open) and a
  branch-checkout-at-advanced-head strict-write test.

Linked from the dev index under Active Implementation Plans.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FQ1Hf4eXLsJmeLUkTYBEw7
Comment on lines +191 to +198

## 4. Cost contract (what `write_cost.rs` asserts after 3b)

Extend the existing local gate (`tests/write_cost.rs`, on the shared
`helpers::cost` harness):

- **opens ≤ |touched tables|, and 0 on a re-open within the same txn** — a
multi-statement mutation that touches table T twice opens T **once**. New

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Differential equivalence test is mis-specified and won't close audit gap 1

The test-only StagedTableNamespace (retained from step 3a) implements the old open-at-HEAD behavior. WriteTxn::open under 3b opens at the pinned base version. When the branch HEAD has advanced past the base, the two paths diverge by design — the test would fail even for a correct 3b implementation. In the trivial case where HEAD == base (no concurrent writers), the test passes, but it proves nothing specific to 3b's pinning: HEAD == base is true for reads too, and doesn't validate that WriteTxn::open correctly returns the base snapshot when HEAD has moved.

To actually close audit gap 1, the test should: (1) open a graph, (2) advance the branch HEAD externally via a direct Lance write, then (3) assert that WriteTxn::open still returns the pinned base version, not HEAD. Comparing against StagedTableNamespace as a reference only works if it is also updated to open by pinned version — but the spec doesn't say it is, and if it is updated it's no longer a meaningful differential comparator.

This affects the audit-gap closure claim in §0 and the PR description.

Fix in Claude Code

Comment on lines +62 to +72
with table count × stage count.
- **The flat-46 schema-read constant** (§0/§6 of the parent): schema validation
re-reads/re-checks the catalog on each resolve instead of once per operation.

Neither scales with history (3a fixed that), but both scale with **work per
operation** (tables × stages), and both are *gone* once the snapshot and handle
are captured once and threaded.

## 2. Design

### 2.1 `WriteTxn`

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 WriteHandleCache::get_or_open with &self requires interior mutability — threading implications unspecified

The WriteTxn::open method signature takes &self but get_or_open must mutate the handle map. In async Rust, across await points in concurrent stage tasks (which all hold &WriteTxn simultaneously), this requires WriteHandleCache to be Send + Sync with interior mutability — concretely Mutex<HashMap<TableKey, Dataset>>. A RefCell-based implementation would compile but panic at runtime as soon as two stage futures poll concurrently. The spec doesn't address this.

Either change the open method to &mut self (simpler, no interior mutability, but concurrent parallel stages can't then share the same WriteTxn) or explicitly require WriteHandleCache: Send + Sync and specify the locking strategy. The deny-list item "In-process-only Dataset impls — Send + Sync" is directly relevant here.

Fix in Claude Code

Comment on lines +110 to +133
// identical to SubTableEntry::open (db/manifest.rs:200) — the O(1),
// session-warm, version-pinned opener reads already use.
instrumentation::open_table_dataset(
&e.location, Some(e.version), &self.session,
)
})
}
}
```

The result: **reads and writes share one open primitive.** The "are the two
openers equivalent?" question (the audit gap, §6) is answered *by construction* —
there is only one.

### 2.3 The load-bearing decision: open at **pinned version**, not HEAD

This is the one behavioral change, and it must be deliberate.

Today the strict-write path opens **HEAD** and fails fast via
`ensure_expected_version(head == pinned)` (`table_store.rs:259` [S]). 3b opens at
the **pinned base version**, so the open-time check is trivially true and is
**replaced by the publish CAS** (`expected_table_versions`, already present).

Why this is correct — and *more* correct for strict ops:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Retry contract for CAS-conflicted strict ops is unspecified

§2.3 moves strict-op conflict detection from open-time fail-fast to commit-time CAS. The spec describes that a stale strict write produces ExpectedVersionMismatch at commit, but never specifies what the caller should do next: re-open a fresh WriteTxn at the new HEAD, or refresh in-place? The handles cache is "commit-invalidated," which implies the txn must be discarded after any commit — successful or conflicted — but this is not stated.

Without this guidance, an implementer could retry by re-staging against the same stale WriteTxn (same pinned base, same cached handles), reproducing the lost-update scenario §4.2 explicitly warns about. The existing PUBLISHER_RETRY_BUDGET = 5 loop in the publisher retries the CAS today, but under 3b the retry loop would also need to refresh the base snapshot. This retry shape should be specified alongside the §2.3 timing-change decision.

Fix in Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant