State cache: spec-derived byte-size estimation and budget-based eviction#71
Open
dapplion wants to merge 18 commits into
Open
State cache: spec-derived byte-size estimation and budget-based eviction#71dapplion wants to merge 18 commits into
dapplion wants to merge 18 commits into
Conversation
314c195 to
9d2fee8
Compare
…viction Add estimated_marginal_bytes() that uses consensus spec knowledge to approximate memory cost per cached state — epoch boundary states (~32MB) vs mid-epoch (~1KB). Track per-state costs and a running cached_bytes sum. New --state-cache-max-mb flag enables byte-budget eviction alongside the existing count-based limit. Exposes estimated bytes via Prometheus metric.
…toricalSummary Required by milhouse's MemoryTracker to measure COW bytes for these types when stored in tree-backed List/Vector fields.
Formula fixes: - Cap sparse estimate at full-tree cost (fixes 15x overestimate at 50% dirty) - Account for Zero-node siblings along spine (fixes epoch boundary underestimate) - Account for Arc<T> overhead in Leaf<T> nodes (fixes Hash256 underestimate) - Fix internal node count: num_leaves-1 (correct for complete binary tree) Add missing fields to estimated_marginal_bytes: - slashings (1 dirty per epoch boundary) - eth1_data_votes (current list length) - historical_summaries (1 per epoch, Capella+) Tests (25): - estimate_tree_bytes: sparse, scattered, adjacent, full for u64/u8/Hash256 - Per-field: balances at 1/10/50/100%, participation, roots, randao, inactivity - Integrated: epoch boundary (1.04x), mid-epoch (3.1x), real epoch transition - Clone chain: shared COW, pruning, same-slot independence - All tests assert both lower bound (estimate >= actual) and max ratio
Each BeaconState carries a Vec<Arc<ApproxOwnedBytes>> recording the tree memory allocated at each transition. States that share ancestry (via clone) share the same Arc entries. Total cache memory is computed by deduplicating entries across all cached states by Arc pointer identity. - ApproxOwnedBytesList field on BeaconState (skipped from serde/ssz/tree_hash) - TreeSnapshot stub in per_slot_processing and per_block_processing (captures pre-state, measures delta — returns 0 until milhouse support) - All fork upgrades preserve approx_owned_bytes via mem::take - rebase_on_finalized resets to finalized's list + unique cost - StateCache::total_approx_owned_bytes() iterates and deduplicates
…marks Implement MemorySize for BeaconState (tree fields via macros + caches + sync committees), CommitteeCache, EpochCache, SyncCommittee, and all remaining leaf types (PendingAttestation, PendingDeposit, PendingPartialWithdrawal, PendingConsolidation, Builder, BuilderPendingPayment, BuilderPendingWithdrawal, Withdrawal). Add PtcWindowEntry newtype for FixedVector MemorySize support. Add state_memory benchmark measuring MemoryTracker::track_item cost: - Single state walk: ~316µs at 1024 validators (linear scaling) - Pre+post delta (slot transition): ~350µs at 1024 validators - Pre+post delta (epoch transition): ~343µs at 1024 validators Co-authored-by: PoulavBhowmick03 <bpoulav@gmail.com>
Merge TODO-state-cache-size.md and DESIGN-cow-tracking.md into a single plan at .claude/state-cache-memory-tracking.md. Update with current status, the persistent MemoryTracker approach, and the three measurement cases. Replace MinimalEthSpec benchmarks with mainnet-scale synthetic states (1M and 2M validators). Results at 1M validators: - Full walk: 459ms - Pre+post slot transition: 451ms (dominated by pre-state walk) - Pre+post epoch transition: 566ms
Replace TreeSnapshot stub with real cow_bytes implementation using milhouse's pairwise tree walk (dapplion/milhouse cow-bytes branch). TreeSnapshot::cow_bytes now calls cow_bytes_between which iterates all tree-backed fields calling List/Vector::cow_bytes. Also adds total_state_tree_bytes for measuring a full state's tree size. Benchmarks at 1M validators (mainnet scale): - cow_bytes slot transition: 541 ns (was 450ms with MemoryTracker) - cow_bytes epoch transition: 12.8 ms - total_tree_bytes: 25.1 ms (initial finalized state, once) - MemoryTracker comparison: 458 ms (850,000x slower for slot)
Replace per-state estimated_marginal_bytes cost tracking with total_approx_owned_bytes() which deduplicates shared ApproxOwnedBytes segments across all cached states via Arc pointer identity. - put_state eviction loop now uses total_approx_owned_bytes() instead of incrementally tracked cached_bytes - Remove per-state cost from LRU tuple (no longer needed) - Add store_beacon_state_cache_cow_byte_size gauge metric - Add store_beacon_state_cache_evictions_total counter metric - Add debug tracing for finalized base size measurement, rebase cow_bytes, and byte budget eviction events
Remove stale plans/state-cache-byte-size.md (original spec-derived estimation design) and update .claude/state-cache-memory-tracking.md to reflect the implemented cow_bytes + ApproxOwnedBytes design.
Delete estimated_marginal_bytes, estimate_tree_bytes, and all 25 tests that validated the old spec-derived estimation formula. These are dead code — eviction now uses total_approx_owned_bytes() via cow_bytes. Replace with 9 tests covering the actual production code path: - cow_bytes_between: clone=0, single mutation>0, epoch boundary large - total_state_tree_bytes: nonzero, scales with validator count - ApproxOwnedBytesList: deduplication across cloned states - StateCache: finalized base size populated, put_state increases total, byte budget eviction fires and removes states
Add committee_caches and sync_committees to the COW measurement: - cow_bytes_between: count cache heap bytes when Arc pointers differ - total_state_tree_bytes: include cache heap bytes in the total - Add CommitteeCache::approx_heap_bytes (shuffling + positions vecs) - Add EpochCache::approx_heap_bytes (effective_balances + base_rewards) Note: cow_bytes_between manually lists tree fields (must stay in sync with rebase_on which uses bimap macros). The bimap macros require &mut and Result return type, which cow_bytes (a read-only usize fn) can't satisfy. A future milhouse change could add an immutable bimap variant.
- Add store_beacon_state_cache_segment_count histogram metric tracking the number of ApproxOwnedBytes segments per cached state - Compact finalized state's segments to a single entry in update_finalized_state (prevents accumulation across finalizations) - Record segment counts each time total_approx_owned_bytes is computed
9185ce2 to
b00b4f4
Compare
PtcWindowEntry was a newtype around FixedVector<u64, N> to satisfy MemorySize bounds that no longer exist. Revert to upstream's plain FixedVector. TreeSnapshot was a struct wrapping state.clone() + cow_bytes_between. Replace with direct clone + cow_bytes_between calls in per_slot_processing and per_block_processing — simpler, no indirection.
Fast path (every put_state): use ApproxOwnedBytesList segments for approximate total. Overcounts from repeated mutations to same paths, but safe direction — triggers eviction early, never late. Slow path (on finalization): recompute_exact_costs runs cow_bytes_between for each cached state, replacing accumulated segments with a single exact entry. Corrects overcount. ~2ms for slot-only caches, ~225ms worst case with epoch boundary states. The slow path runs in update_finalized_state which already does expensive work (pruning, hdiff management). Adding 225ms there is acceptable.
15ee7a6 to
421b60d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Memory-aware state cache eviction using milhouse's
cow_bytespairwise tree walk.Each BeaconState carries
ApproxOwnedBytesList— byte counts for tree memory produced by each transition. States sharing ancestry share the same Arc entries. Total cache memory = unique entries across all states (deduplicated by Arc pointer).cow_byteswalks two trees in parallel, skipping shared subtrees viaArc::ptr_eq. ~541ns per slot transition at 1M validators (vs 458ms for MemoryTracker). Finalized state base size measured once viatotal_tree_bytes(~25ms).Eviction triggers when
total_approx_owned_bytes()exceeds--state-cache-max-mb. Caches (committee, sync committee, epoch) included in measurement.Depends on sigp/milhouse#100.