Skip to content

StaticColdStore: add backfill mode for skipped-slot writes#79

Open
galadd wants to merge 1 commit into
dapplion:static-files-generalization-specfrom
galadd:monotonic-write-impl
Open

StaticColdStore: add backfill mode for skipped-slot writes#79
galadd wants to merge 1 commit into
dapplion:static-files-generalization-specfrom
galadd:monotonic-write-impl

Conversation

@galadd
Copy link
Copy Markdown

@galadd galadd commented Jun 2, 2026

Issue Addressed

Resolves the monotonic-write blocker identified in #75 that prevents ERA phase-2 state reconstruction from working with the static cold backend.

After phase 1 of ERA import writes era-boundary states at slots 8192, 16384, etc., phase 2 (reconstruct_states_parallel) backfills intermediate states at slots behind highest_written_slot, which the static archive rejects:

StaticColdStoreError(Invalid("static cold put_batch out of order vs highest_written_slot"))

This PR enables those skipped slots to be filled in.

Proposed Changes

Backfill mode for StaticColdStore

When allow_backfill = true in StoreConfig, columns accept writes to previously-skipped slots (where the offset table entry is zero). Data is appended to the end of the data file (still append-only — no in-place writes), and the offset table entry for the backfilled slot is updated to point to it. highest_written_slot does not advance.

Conf format upgrade (LHSTBLK2 → LHSTBLK3)

New 52-byte conf format adds:

  • flags byte (bit 0 = allow_backfill)
  • backfill_file_id u64 — tracks which non-current file has uncommitted backfill data
  • backfill_data_len u64 — committed length of that file

Old 36-byte LHSTBLK2 confs open in read-only mode (backfill disabled). When allow_backfill = true is configured, the conf is automatically upgraded to LHSTBLK3 on first open. The magic bump prevents old code from opening a backfill-enabled store.

Crash recovery

heal_on_open handles three crash windows:

  1. Data written, offset not written → data beyond current_data_len is truncated; no offset points to it
  2. Data + offset written, conf not updated → data truncated, then scan_and_zero_dangling_offsets catches any offset pointing past the committed length and zeros it
  3. Conf updated → data is committed, no healing needed

scan_and_zero_dangling_offsets runs in O(SLOTS_PER_FILE) = O(8192) per file, only on open, only for at most two files (current + backfill).

Constraint: one non-current file per batch

A single put_batch_backfill call may target at most one non-current file. This matches actual usage (ERA reconstruction writes one 8192-slot era at a time) and ensures the conf can track the backfill file for crash recovery.

Bug fix: read_offset on missing offset files

Previously, read_offset called File::open directly and would panic when the offset file didn't exist. Backfill into a sealed file whose offset file hasn't been created yet triggers this. Now returns Ok(0) (slot was skipped).

Test suite: 19 tests

  • test static_cold::tests::test_backfill_batch_across_file_boundaries ... ok
  • test static_cold::tests::test_backfill_mixed_sequential_and_backfill_batch_rejected ... ok
  • test static_cold::tests::test_backfill_does_not_advance_highest_written_slot ... ok
  • test static_cold::tests::test_backfill_batch_multiple_slots_same_file ... ok
  • test static_cold::tests::test_backfill_compressed_column_round_trip ... ok
  • test static_cold::tests::test_backfill_idempotent_reput ... ok
  • test static_cold::tests::test_backfill_into_sealed_file ... ok
  • test static_cold::tests::test_backfill_into_current_file ... ok
  • test static_cold::tests::test_backfill_rejected_when_disabled ... ok
  • test static_cold::tests::test_backfill_rejects_already_populated_slot_with_different_data ... ok
  • test static_cold::tests::test_contains_backfilled_slot ... ok
  • test static_cold::tests::test_heal_truncates_multiple_uncommitted_backfills ... ok
  • test static_cold::tests::test_heal_truncates_data_and_zeros_dangling_offset_after_crash ... ok
  • test static_cold::tests::test_heal_truncates_uncommitted_backfill_in_sealed_file ... ok
  • test static_cold::tests::test_heal_preserves_committed_backfill_data ... ok
  • test static_cold::tests::test_backfill_skipped_slot_readable ... ok
  • test static_cold::tests::test_iter_from_includes_backfilled_slots ... ok
  • test static_cold::tests::test_sequential_write_after_backfill_correct_data_len ... ok
  • test static_cold::tests::test_v2_conf_upgrades_to_v3_when_backfill_enabled ... ok

Additional Info

Reconstruction integration

The reconstruct.rs change to allow reconstruction with backfill is not included in this PR — it depends on sigp#9273 (ERA files) being rebased onto this branch. The current PR only adds the storage-layer capability. The caller-side integration lands when the two branches merge.

At-most-two-files invariant

The design tracks at most one non-current backfill file in the conf (backfill_file_id / backfill_data_len). This means at most two files can have uncommitted data at any time: the current file and one backfill file. If a crash leaves orphaned data in a file that isn't tracked by either field, that data is harmless (no offset points to it) but wastes space until overwritten. A future improvement could scan all files on open to reclaim orphaned space, but this isn't necessary for correctness.

ERA reconstruction workload

Phase 2 reconstruction writes states at slots 1–8191, then 8193–16383, etc. Each era's backfill targets exactly one non-current file before moving to the next. The one-non-current-file-per-batch constraint is naturally satisfied. If future workloads need to backfill across multiple sealed files in a single batch, the conf format would need a more general tracking mechanism (e.g., a separate backfill manifest file).

Enables previously-skipped slots (offset == 0) to be filled in after
the column's highest_written_slot has advanced past them. This unblocks
ERA phase-2 reconstruction, which writes intermediate states at slots
behind the era-boundary high-water mark.

- New conf format (LHSTBLK3, 52 bytes) with allow_backfill flag,
  backfill_file_id, and backfill_data_len fields
- Backward-compatible: LHSTBLK2 (36-byte) confs open read-only; upgrade
  to LHSTBLK3 happens automatically when allow_backfill is enabled
- Column::put_backfill and Column::put_batch_backfill for single and
  batched backfill writes
- heal_on_open handles crash recovery for both current and backfill
  files, including scan_and_zero_dangling_offsets
- Enforced constraint: at most one non-current file per backfill batch
- read_offset returns Ok(0) for missing offset files (required for
  backfill into sealed files)
- StoreConfig::allow_backfill propagated to ColumnConfig
- Full test suite: 19 tests covering open/get/put/put_batch,
  backfill behavior, crash recovery, conf upgrade, and edge cases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant