Generalize cold DB: ColdStore trait + slot-keyed static archive by dapplion · Pull Request #75 · dapplion/lighthouse

dapplion · 2026-05-08T17:42:13Z

Summary

EDIT: Notes from Geth on how they achieve crash safety for static files. I believe my solution already impls something similar, but would be good to double check https://gist.github.com/rjl493456442/39ca1b37c4fdaa4e71e6e9c5eba34050

Generalises the cold DB so the same HotColdDB can sit on either the existing
KV cold backend or a new file-based static archive. Lays out the static-cold
spec and the file format; ships the file backend (StaticColdStore) but does
not wire it into HotColdDB yet — that integration lands in a follow-up.

`ColdStore<E>` trait (replaces the byte-keyed `Cold: ItemStore<E>` bound)

Slot-typed bulk: get, put_batch, contains, iter_from, sync.
Root-keyed indices via a tight DBColumnColdIndex enum (BlockSlot,
ColdStateSummary): get_index, put_index_batch. Owned by the cold
backend, not spilled into the hot DB.
KV backends (BeaconNodeBackend, MemoryStore) implement ColdStore by
translating Slot/Hash256 keys into the underlying KeyValueStore byte
API.
Migration helpers (store_cold_state*) take separate slot-keyed and
root-index buffers; cold bulk lands first, indices after — so a crash leaves
no dangling indices.

`StaticColdStore`

One type, all slot-keyed cold columns dispatched by a tight DBColumnCold
enum (Block, BlockRoots, StateRoots, StateSnapshot, StateDiff).
No DBColumn references inside static_cold.rs.
Per-column subdirectory (<root>/{blk,bbr,bsr,bss,bsd}/), opened eagerly
at boot. Frozen HashMap<DBColumnCold, Column> after construction; only
per-column writer-state mutex on the hot path.
Per-column settings (record_type, compression, max_value_bytes) come
from a build-time column_config table on first creation, then are
persisted in each column's conf and the on-disk values win on re-open. Conf
magic bumped to LHSTBLK2. Future builds with different defaults stay
backward-compatible with existing data.
BeaconStateDiff uses compression: false — HDiff is already compressed
internally (zstd'd validator and balance chunks; xdelta3 state diff), so
snappy on top is wasteful.

Removes `lighthouse db prune-states` / `prune_historic_states`

Per specs/static-cold-backend.md, the mode they produce ("cold blocks
present, cold states absent") isn't in the startup-path table and the spec
doesn't support runtime mode transitions in either direction.
full_state_pruning_enabled goes with it.

Specs

specs/static-cold-backend.md — pluggable cold backend design (modes,
ownership, writers, availability rules, removed APIs, backend API surface).
specs/static-blocks.md — slot-keyed file format for blocks (kept as the
baseline format; generalised by column_config for the other cold columns).
specs/era-storage.md — slot-keyed blob archive design (no implementation
in this PR).

Test plan

cargo check --workspace --tests passes (verified locally).
cargo nextest run -p store — exercises MemoryStore as a ColdStore
via the existing tests.
cargo nextest run -p beacon_chain --test store_tests — migration paths
still pass without the prune_historic_states test.
Manual: open a fresh StaticColdStore, write blocks/state diffs/
snapshots/roots at ascending slots in each column, re-open, read them
back. Crash-recovery exercise: kill mid-put, verify heal_current_file
truncates uncommitted data.

Add a slot-keyed durable archive (`StaticBlockStore`) for finalized blinded blocks, integrated into `migrate_database` as a second pass that runs alongside the existing cold-state migration. File format and manifest persistence remain `todo!()` — this is the wiring scaffold. - New `DBColumn::BeaconBlockSlot` reverse index (root → slot). - `HotColdDB::get_block_with` and `block_exists` fall through to the archive after a hot-KV miss. - Archival driven inside `migrate_database`: cold ops (BeaconBlockRoots + BeaconBlockSlot) commit atomically, hot deletes after split commit. - Skip-slot dedup seeded from `BeaconBlockRoots[current_split.slot - 1]`, with `Hash256::ZERO` for the genesis case. - Spec at `specs/static-blocks.md`.

Companion document describing the static-file backend for `BlobSidecar` archival via `.erb` files. Initialization via genesis sync or imported era files; checkpoint sync and P2P blob backfill rejected at startup.

Replaces the byte-keyed Cold: ItemStore<E> bound on HotColdDB with a slot-typed ColdStore<E> trait: get/put_batch/exists/iter_from for slot-keyed columns plus get_index/put_index_batch over a tight DBColumnColdIndex enum (BlockSlot, ColdStateSummary). KV backends (BeaconNodeBackend, MemoryStore) implement it by translating slot/root keys into the existing KeyValueStore byte API. StaticBlockStore generalised to StaticColdStore: one type, columns dispatched on each call. Per-column subdirectory; per-column settings (record_type, compression, max_decompressed) come from a build-time column_config table on first creation and are persisted in each column's conf so future builds with different defaults stay compatible. Conf magic bumped to LHSTBLK2. Removes prune_historic_states + the lighthouse db prune-states CLI: the mode they produce ("cold blocks present, cold states absent") isn't in the startup-path table in specs/static-cold-backend.md and the spec doesn't support runtime mode transitions. full_state_pruning_enabled goes with it. Other: store_cold_state* helpers take separate slot-keyed and root-index buffers; migration writes slot-keyed cold data first, root indices after, so a crash leaves no dangling indices.

- Move beacon_node/store/src/static_blocks.rs to static_cold.rs (the type is no longer block-specific). - Add DBColumnCold (slot-keyed cold columns) alongside DBColumnColdIndex. StaticColdStore is keyed by DBColumnCold all the way through; no DBColumn conversion happens inside static_cold.rs. column_config returns a plain ColumnConfig (was Option) and UnsupportedColumn errors go away — the tighter enum makes them unrepresentable. - Eager-open every cold column at boot, freeze the columns map. No outer Mutex/RwLock; the per-column writer state mutex is the only sync point. - Rename ColumnConfig::max_decompressed -> max_value_bytes (it bounds the raw payload size on uncompressed reads too, defending against corrupt headers). - BeaconStateDiff: compression: false. HDiff is already compressed internally (zstd'd validator/balance chunks) so snappy on top is wasteful.

The slot-keyed methods on ColdStore (get/put_batch/contains/iter_from) now take the tight DBColumnCold enum instead of DBColumn, mirroring the existing DBColumnColdIndex shape on the index methods. This drops DBColumn from static_cold.rs entirely. KV backend impls (BeaconNodeBackend, MemoryStore) translate via column.db_column(). FrozenForwardsIterator::new still accepts DBColumn at the public boundary and converts at the call to cold_db.iter_from. Also: delete static_blobs.rs (was a stub returning Unsupported on every call, with no callers). Revert noise renames (io_batch, cold_db_block_ops, cold_db_state_ops, ops, .map_err(|e| e.into())) to keep the diff against unstable focused on real semantic changes.

`BeaconBlockSlot` (and the `DBColumnColdIndex::BlockSlot` variant that wrapped it) was added for a static-archive read-fallback path that was removed earlier in this branch. Nothing writes or reads it now, so drop the variant from the DBColumn enum, the matching DBColumnColdIndex variant, the `MissingFrozenBlockSlot` error, and the corresponding key_size match arm. Rewrite TODO-static-block-storage.md to reflect the current branch state: the static-cold generalization is in, the prune-states removal is in, and the remaining work is cold-backend selection (flag), review of block read/write paths now that BeaconBlockSlot is gone, an invariants review, and tests.

The two explicit impls (BeaconNodeBackend, MemoryStore) were identical boilerplate translating slot/root keys into the underlying byte-keyed KeyValueStore. Replace with a single blanket impl in lib.rs. Forecloses a future ColdStore impl that isn't a KeyValueStore (e.g. wiring StaticColdStore directly as the Cold parameter); reversible if/when that becomes wanted.

The blanket `ColdStore` impl writes `slot.as_ssz_bytes()` for `BeaconColdStateSummary`, where older releases wrote SSZ-encoded `ColdStateSummary { slot }`. The two encodings are byte-identical (an SSZ container of one fixed-size field equals the field), but the equality is load-bearing for read compatibility with existing databases. Add a regression test that pins it.

The slot-walk rewrite of `check_cold_state_diff_consistency` was forced by not having an index iterator on the trait. Add `iter_index(col)` (yields `(Hash256, Slot)`) and restore the invariant to iterating `BeaconColdStateSummary` directly, matching unstable's structure modulo the slot-typed API.

Replace the two-buffer (slot-keyed data + state-root index) helper signatures with a single `&mut ColdBatch` and add `commit_cold_batch` that flushes data, syncs, then commits the index — encoding the data-before-index ordering at the API. `put_state` and `reconstruct.rs` collapse to "build batch, commit batch." The migration loop keeps a top-level summary index that accumulates across states and is flushed at end-of-migration; per-iteration data still goes through `commit_cold_data` (renamed from `commit_cold_items`).

Drops the `KeyValueStore -> ColdStore` blanket and replaces it with an explicit per-backend impl. `BeaconNodeBackend` no longer impls `ColdStore` directly — its byte-translation is inlined inside the `ColdBackend::Kv` arm where it's actually used. `MemoryStore` keeps an explicit impl (still used as the Cold parameter in tests via `EphemeralHarnessType`). `ColdBackend<E>` is a new enum with `Kv(BeaconNodeBackend)` / `Static(StaticColdStore)` variants, picked at startup from `StoreConfig::cold_backend` (default `Kv`). Production type signatures swap the second `BeaconNodeBackend<E>` slot to `ColdBackend<E>` (3 production sites, 6 test sites, 3 database_manager sites). `StaticColdBackend<E>` wrapper from the previous commit collapsed into a direct `impl<E> ColdStore<E> for StaticColdStore`. Index methods stub `Unsupported` for now — wiring the embedded KV is the next piece.

Genesis sync against the static cold backend was failing for two reasons: 1. `BeaconColdStateSummary` and friends are root-keyed indices; the static files are slot-keyed. The previous `Unsupported` stubs blocked the very first migration. Embed a `BeaconNodeBackend<E>` at `<root>/index/` and serve `get_index` / `put_index_batch` / `iter_index` from it. Forwards iteration over slot-keyed columns (`iter_from`) is now also implemented by walking the column's `.off` sidecar. 2. `BeaconChainBuilder::genesis` pre-writes the genesis block_root to cold `BlockRoots` at slot 0, then the first migration writes the same (slot, root) again. KV cold accepts the overwrite; the static backend's strict-ascending check rejected it. `Column::put` now treats a re-put of an identical value at the current highest slot as a no-op, and errors only on a value mismatch (a real bug). Threads `StoreConfig` into `StaticColdStore::open` so the embedded KV picks up the same backend (`leveldb` / `redb`) and tuning as the hot/blobs DBs. Adds `genesis_sync_static_cold` covering ~1000 finalized blocks with the static backend and a load of every cold state through the new index.

Drops the bespoke 1000-block static-cold test and instead has get_store read the cold backend from COLD_BACKEND=static|kv. CI / local can now run the existing store_tests suite against either backend without duplicating test bodies. Also trims ColdBackendKind to the derives actually exercised today. Display, EnumString, VariantNames, Copy were forward-looking for the not-yet-wired --cold-backend CLI flag - re-add when that lands.

The static cold backend is append-only in ascending slot order, so checkpoint/weak-subjectivity sync (which backfills slots below the anchor) is fundamentally incompatible. Refuse the combination explicitly in BeaconChainBuilder::weak_subjectivity_state instead of failing later with an opaque 'static cold put out of order' error. The 6 weak_subjectivity_sync_* tests early-return under COLD_BACKEND=static so the test suite passes against either backend. Adds the --cold-backend CLI flag (kv|static, default kv) so operators can opt into the static backend at startup. Re-adds EnumString and VariantNames on ColdBackendKind for clap parsing.

Idempotent put at any committed slot makes `migrate_database` retries safe after a mid-loop crash. The previous put accepted re-puts only at exactly `highest_written_slot`; on retry, slot 0 < highest fired out-of-order. Now any committed slot accepts an identical-value re-put; mismatched values and skipped-slot fills still error. New `COLD_BACKEND_KEY` in `BeaconMeta` pins the backend kind on first open and refuses mismatched re-opens (Static and Kv on-disk layouts are incompatible). `reconstruct_historic_states` refuses to run under static cold — the slots it would write are below every column's high-water mark. `max_value_bytes` ratchets upward on open if the build default exceeds disk, so a newer build can write larger records than an older one persisted, and re-persists immediately for stable re-opens. Per-column files renamed `static_blocks_*` -> `data_*`, `static_blocks.conf` -> `column.conf` — the literal prefix was misleading after the per-column generalisation. `kv_cold_store` helper module dropped; `MemoryStore`'s `ColdStore` impl inlined to match `ColdBackend::Kv`. Two impls, no shared helper. `decompress_record` returns `Result<Vec<u8>>` (was `Result<Option<Vec<u8>>>` with `Some` on every success path). `TODO(static)` markers added for `iter_from` perf, the migrate-vs-index transient invariant 11 window, invariants 10/11/12 re-review under static cold, and the missing test set. Spec cleanup: delete `specs/static-blocks.md` (stale, ~60% contradicted the code) and `TODO-static-block-storage.md`. Rewrite the `static_cold.rs` module header as the canonical byte-level format reference (layout, data file, `column.conf`, put contract, recovery).

Adds a sibling job to `beacon-chain-tests` that runs `beacon_chain::store_tests::*` with `COLD_BACKEND=static` (and `FORK_NAME=fulu`) to exercise the static slot-keyed cold-DB backend on every CI run. Mirrors the existing job's runner, toolchain, cache, and feature flags (`fork_from_env,slasher/lmdb,portable`). Added to `test-suite-success` so the merge queue blocks on it.

Adds the missing pieces so the static cold archive can serve block-by-root reads without keeping a duplicate in hot indefinitely. Schema (re-adds what f671da1 dropped): - `DBColumn::BeaconBlockSlot` (tag `bbs`, 32-byte key, 8-byte SSZ Slot) - `DBColumnColdIndex::BlockSlot` variant Migrate (`migrate_database`): - alongside the existing block-bulk push to `cold.Block`, push the matching `(block_root, slot)` to `cold_block_slot_index` and the `block_root` to `hot_block_delete_roots` - end-of-loop: `put_index_batch(BlockSlot, ...)` after `ColdStateSummary`, before split commit - post split commit: `hot_db.do_atomically(deletes)` reclaims hot space for the just-migrated blocks. Hot delete only runs after cold bytes + cold index are durable, so a crash here leaves cold canonical and reads fall through. KV mode keeps `move_blocks_to_static_cold` false → all the new buffers stay empty → status quo. Read fallback (`get_block_with`, `block_exists`): - hot first; on miss, `cold.get_index(BlockSlot, root)` then `cold.get(Block, slot)`. Missing bulk for an indexed slot raises `MissingFrozenBlock` (corruption). KV mode's empty BlockSlot index makes the fallback always return None on hot miss — identical to before. Invariant 10 (`check_cold_block_root_indices`): - now uses `self.block_exists(&block_root)` (the public read with cold fallback) instead of the bare `hot_db.key_exists(...)`. Required because hot-delete makes the bare hot check fire spuriously for every migrated slot under Static cold. Init-path coverage: - Genesis + KV: cold writes gated off, BlockSlot empty, fallback always None on hot miss. Status quo. - Genesis + Static: migrate writes block + index to cold, deletes from hot. Reads ≥ split.slot hit hot; < split.slot hit cold via fallback. - Era + Static: hot has only post-anchor blocks. cold has 0..S from era (future era-import path) + post-S from migrate. Fallback is the read path for slot < S. - Ckpt + KV: BlockSlot empty as in Genesis + KV. Backfill fills hot. - Ckpt + Static (no era): rejected by the existing WSS guard.

`make cli-local` after `e259a5157b` introduced `--cold-backend` without touching `book/src/help_bn.md`, so `cli-check` failed on every push.

Re-added in `bbc3badfd2` (`BeaconBlockSlot`); the hardcoded snapshot in `check_db_columns` wasn't updated, so the test asserted on a stale list.

dapplion · 2026-05-10T03:34:20Z

Column::put_batch: one fsync per file instead of per slot — branch static-cold-batched-fsync

The current ColdStore::put_batch is a one-line for item in items: self.put(item). Each put does 4 fsyncs (data file sync_all, offset file sync_all, then write_config which is tmp.sync_all + rename + dir.sync_all). For an 8192-item batch that's ~32k fsyncs, ~150 s on /mnt/ssd NVMe.

Replaced put_batch with a real batched implementation: group items by file_id, append all records through a 1 MiB BufWriter, then one data_file.sync_all, write all offsets, one off_file.sync_all, one atomic write_config for the whole batch. Same caller-visible "batch durable on return" contract; spec doc updated.

Microbench (beacon_node/store/examples/static_cold_bench.rs, 8192 items, /mnt/ssd NVMe):

column	old `put` loop	new `put_batch`	speedup
Block	36.1 s	0.23 s	155×
BlockRoots	88.3 s	0.17 s	519×
StateRoots	123.9 s	0.16 s	775×

Tests + make lint-full clean. PR-ready: https://github.com/dapplion/lighthouse/pull/new/static-cold-batched-fsync

dapplion · 2026-05-10T03:34:47Z

End-to-end mainnet ERA import: 51.27 h → 1.22 h (42× speedup)

Combined this PR (#75) with the ERA-import lcli (sigp#9273 / #69) plus the put_batch fix plus a custom direct-byte SSZ blinder, and ran a full mainnet ERA import (1260 era files = ~10.3 M slots, eras 0..1260). Branch: experiment-era-static-cold-load.

Compared against the tuned KV-cold-backend run (/mnt/ssd/era-test-logs/era-import-timing.csv, 1258 eras in 51.27 h, same hardware, same ERA file source /mnt/ssd/era-mainnet-nimbus/):

backend	eras	total	mean s/era
KV (tuned)	1260	51.27 h	146.5
Static (this branch)	1260	1.22 h	3.49

The custom blinder is the second large win on top of the put_batch fix. ERA files store full SignedBeaconBlocks; the importer wants blinded SSZ for the slot-keyed Block column. Doing the typed parse + clone_as_blinded + as_ssz_bytes round-trip allocates ~hundreds of small heap objects per block (every Attestation's AggregationBits, deposits, sync committee bits, …) which then immediately get discarded. The custom blinder walks BeaconBlockBody SSZ container offsets directly and only typed-decodes Transactions + Withdrawals slices for tree_hash_root. Verified byte-identical against clone_as_blinded().as_ssz_bytes() in beacon_node/beacon_chain/examples/blinder_bench.rs:

sample	parse + blind + encode	custom blinder	speedup
Capella block (128 KB)	8.4 ms	2.1 ms	4.03×
Deneb block (86 KB)	7.7 ms	1.15 ms	6.70×

Pre-Bellatrix: trivial passthrough since FullPayload ≡ BlindedPayload SSZ-encoding. Bellatrix and Electra+ fall back to the typed path for now.

Per-phase tracing breakdown across all 1260 eras:

phase	mean / era	% of parent
`import_era_file` (parent)	3.48 s	100%
`era_import_decompress_blocks` + `era_import_blind_blocks`	~1.49 s combined	~43%
`era_import_write_blocks`	0.65 s	19%
`era_import_write_state`	0.49 s	14%
`era_import_read`	0.42 s	12%
`era_import_decode_state`	0.17 s	5%
`era_import_write_state_root_index`	0.078 s	2%
`era_import_write_block_index`	0.057 s	2%

Per-era CSV in same shape as the KV reference: /mnt/ssd/lh-bench/claude-lh-era-files-static/logs/static-import-timing.csv. Disk footprint ends at ~135 GB vs KV ~681 GB (5× smaller — blinded blocks, no LevelDB compaction overhead, append-only files).

dapplion · 2026-05-10T03:35:38Z

Architectural follow-up: phase-2 reconstruction is incompatible with monotonic-forward writes

After phase 1 of the ERA importer finishes (era-boundary states written to StateSnapshot / StateDiff columns at slots 8192, 16384, …, 1260·8192), phase 2 (reconstruct_states_parallel) tries to backfill intermediate states by replaying blocks slot-by-slot. Those writes target slot ranges behind highest_written_slot for the column, which the static archive rejects:

StaticColdStoreError(Invalid("static cold put_batch out of order vs highest_written_slot"))

Sequentialising the era loop (this branch tries that) doesn't help — the conflict is between phase-1 boundary writes and phase-2 intermediate writes, not between parallel workers.

Two viable directions, both real design changes:

Allow random-slot writes within an existing file_id. The .off sidecar is already a fixed-size table indexed by slot % SLOTS_PER_FILE. Replace the strict-monotonic invariant with a per-slot "is this offset already populated?" check, and append data to the data file regardless of arrival order. highest_written_slot becomes "highest committed slot" but doesn't gate writes to lower slots within already-existing files.
Reorder the import: do reconstruction interleaved with phase 1, so each era's intermediate states get written immediately after that era's boundary — fully ascending. Avoids the conflict by construction but requires restructuring import_era_file and would lose the "phase-1 finishes fast and you can keep using the chain while phase 2 grinds" property the legacy KV path has.

(1) is the smaller change and preserves the existing API. Just a flag in the .conf to record the new mode. Happy to draft if you want to take that direction.

This blocker doesn't affect the headline 42× phase-1 speedup (block + state-boundary writes work fine), but it's the gating issue for the static backend reaching feature parity with the KV cold backend.

dapplion · 2026-05-11T01:15:19Z

Custom transactions tree-hasher: transactions_tree_hash_root_from_ssz_bytes — branch transactions-tree-hash-from-ssz-bytes

Follow-up to the end-to-end ERA-import experiment above. Inside the custom blinder, the dominant remaining cost is hashing the transactions list. Transactions::from_ssz_bytes(bytes)?.tree_hash_root() allocates one Vec<u8> per transaction (hundreds per mainnet block) just to throw it away after the hash.

Replaced with a direct-byte hasher that walks the SSZ List<Transaction, MAX_TX> offset table, hashes each transaction's bytes in place via tree_hash::merkle_root, and list-merkleizes the per-tx roots. Output is byte-identical to the typed path.

Microbench (beacon_chain/examples/hash_bench.rs in experiment-era-static-cold-load, real mainnet Capella + Deneb blocks):

op	typed	custom	speedup
transactions (Capella, 92.5 KB / ~250 txs)	1498 µs	607 µs	2.47×
transactions (Deneb, 46.4 KB / ~150 txs)	991 µs	519 µs	1.91×
withdrawals (≤16 × 44 B = 704 B)	8.2–9.4 µs	7.9–8.9 µs	1.04–1.06×

Withdrawals were also benchmarked — the typed path is already in the noise (~5 ns difference) so a hand-rolled withdrawals hasher isn't worth the maintenance burden. Only the transactions one ships.

End-to-end blinder bench after integrating the new hasher into era::custom_blinder:

sample	typed pipeline	with custom blinder + tx hasher	speedup
Capella block (128 KB → 35 KB blinded)	4.83 ms	1.58 ms	3.05×
Deneb block (86 KB → 39 KB blinded)	4.45 ms	0.93 ms	4.78×

The PR branch transactions-tree-hash-from-ssz-bytes is based on unstable, independent of this PR — it's a general-purpose types crate helper for any code that has Transactions SSZ bytes and only needs the root. Returns a real Result<Hash256, ssz::DecodeError> (no panicking slicing or unchecked arithmetic), 6 unit tests covering empty / single / many / mixed-size / single-large / chunk-boundary edges. make lint-full clean.

PR-creation URL: https://github.com/dapplion/lighthouse/pull/new/transactions-tree-hash-from-ssz-bytes

Replace the per-slot fsync loop in `put_batch` with one fsync per file: items are grouped by file_id, all records appended through a BufWriter, then a single sync_all for the data file, all offsets written, single sync_all for the offset file, and a single atomic config commit per batch. Same caller-visible "batch durable on return" contract. For an 8192-item batch (one ERA's worth of slot-keyed writes) this drops fsync count from ~32k (4 per slot) to ~3, with measured speedups between 155x and 775x per column on /mnt/ssd NVMe. Spec updated to reflect the batched semantics.

galadd · 2026-05-20T23:00:23Z

Architectural follow-up: phase-2 reconstruction is incompatible with monotonic-forward writes

After phase 1 of the ERA importer finishes (era-boundary states written to StateSnapshot / StateDiff columns at slots 8192, 16384, …, 1260·8192), phase 2 (reconstruct_states_parallel) tries to backfill intermediate states by replaying blocks slot-by-slot. Those writes target slot ranges behind highest_written_slot for the column, which the static archive rejects:
StaticColdStoreError(Invalid("static cold put_batch out of order vs highest_written_slot"))
Sequentialising the era loop (this branch tries that) doesn't help — the conflict is between phase-1 boundary writes and phase-2 intermediate writes, not between parallel workers.

Two viable directions, both real design changes:
1. **Allow random-slot writes within an existing file_id.** The `.off` sidecar is already a fixed-size table indexed by `slot % SLOTS_PER_FILE`. Replace the strict-monotonic invariant with a per-slot "is this offset already populated?" check, and append data to the data file regardless of arrival order. `highest_written_slot` becomes "highest committed slot" but doesn't gate writes to lower slots within already-existing files.

2. **Reorder the import: do reconstruction interleaved with phase 1**, so each era's intermediate states get written immediately after that era's boundary — fully ascending. Avoids the conflict by construction but requires restructuring `import_era_file` and would lose the "phase-1 finishes fast and you can keep using the chain while phase 2 grinds" property the legacy KV path has.
(1) is the smaller change and preserves the existing API. Just a flag in the .conf to record the new mode. Happy to draft if you want to take that direction.

This blocker doesn't affect the headline 42× phase-1 speedup (block + state-boundary writes work fine), but it's the gating issue for the static backend reaching feature parity with the KV cold backend.

I've added unit tests for StaticColdStore covering the open/get/put/put_batch paths, crash recovery via heal_current_file, and the idempotent re-put invariant (PR #78 ).

I'm looking at the monotonic-write blocker for ERA reconstruction next. Your Option A (allow random-slot writes within existing file_ids, with a per-slot "is this offset already populated?" check and a conf flag for the new mode) makes sense to me as the smaller change. But I want to make sure I understand the crash window before implementing.

Currently heal_current_file truncates the data file to current_data_len from the conf, then clears offsets beyond highest_written_slot. Under backfill mode, backfilled data is appended past current_data_len for slots below highest_written_slot. If a crash happens after the data append and offset write but before the conf update, heal will truncate the data file — but the backfilled slot's offset entry is below highest_written_slot so it
won't be cleared, creating a dangling pointer.

Is the intended design to track the backfill extent separately in the conf (e.g. a backfill_data_len field), or to change heal_current_file to scan offsets and determine the true data extent in backfill mode? Or is there another approach I'm not seeing?

galadd · 2026-06-02T13:57:11Z

I've implemented the backfill mode discussed in the architectural follow-up (Option A) as PR #79 against this branch.

The crash window I asked about earlier is handled by tracking the backfill file separately in the conf (backfill_file_id + backfill_data_len), plus a scan_and_zero_dangling_offsets pass on heal that catches any offset pointing past the committed data length. This covers the case where data + offset are written but the conf isn't updated before a crash.

Conf format bumped from LHSTBLK2 (36 bytes) to LHSTBLK3 (52 bytes) with backward-compatible open of old-format stores. 19 tests covering backfill behavior and crash recovery.

This is storage-layer only — the reconstruct.rs integration depends on sigp#9273 being rebased onto this.

dapplion added 8 commits May 8, 2026 19:36

Add era blob storage spec

9546bd4

Companion document describing the static-file backend for `BlobSidecar` archival via `.erb` files. Initialization via genesis sync or imported era files; checkpoint sync and P2P blob backfill rejected at startup.

Specify static block file format

58fdb61

Implement static block file store

2c40f0f

Add static blob API

ad2c387

Fix static block lint

af6e99b

dapplion requested a review from michaelsproul as a code owner May 8, 2026 17:42

dapplion added 15 commits May 8, 2026 20:04

Refresh lighthouse beacon_node help snapshot for --cold-backend

a0d8ffb

`make cli-local` after `e259a5157b` introduced `--cold-backend` without touching `book/src/help_bn.md`, so `cli-check` failed on every push.

schema_stability: include bbs in expected DBColumn snapshot

0381575

Re-added in `bbc3badfd2` (`BeaconBlockSlot`); the hardcoded snapshot in `check_db_columns` wasn't updated, so the test asserted on a stale list.

dapplion mentioned this pull request May 11, 2026

ERA → static cold backend: end-to-end integration (42× faster than KV) #76

Open

dapplion mentioned this pull request May 11, 2026

Fast block/blob archive node via ERA files sigp/lighthouse#9285

Open

galadd mentioned this pull request May 20, 2026

TODO test(static_cold): add storage engine correctness test suite #78

Open

galadd mentioned this pull request Jun 2, 2026

StaticColdStore: add backfill mode for skipped-slot writes #79

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize cold DB: ColdStore trait + slot-keyed static archive#75

Generalize cold DB: ColdStore trait + slot-keyed static archive#75
dapplion wants to merge 24 commits into
unstablefrom
static-files-generalization-spec

dapplion commented May 8, 2026 •

edited

Loading

Uh oh!

dapplion commented May 10, 2026

Uh oh!

dapplion commented May 10, 2026

Uh oh!

dapplion commented May 10, 2026

Uh oh!

dapplion commented May 11, 2026

Uh oh!

galadd commented May 20, 2026 •

edited

Loading

Uh oh!

galadd commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dapplion commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

ColdStore<E> trait (replaces the byte-keyed Cold: ItemStore<E> bound)

StaticColdStore

Removes lighthouse db prune-states / prune_historic_states

Specs

Test plan

Uh oh!

dapplion commented May 10, 2026

Uh oh!

dapplion commented May 10, 2026

Uh oh!

dapplion commented May 10, 2026

Uh oh!

dapplion commented May 11, 2026

Uh oh!

galadd commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

galadd commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dapplion commented May 8, 2026 •

edited

Loading

`ColdStore<E>` trait (replaces the byte-keyed `Cold: ItemStore<E>` bound)

`StaticColdStore`

Removes `lighthouse db prune-states` / `prune_historic_states`

galadd commented May 20, 2026 •

edited

Loading