Skip to content

perf(l1): optimize trie building in snap sync insertion#6410

Open
ElFantasma wants to merge 18 commits intomainfrom
perf/trie-building-optimizations
Open

perf(l1): optimize trie building in snap sync insertion#6410
ElFantasma wants to merge 18 commits intomainfrom
perf/trie-building-optimizations

Conversation

@ElFantasma
Copy link
Copy Markdown
Contributor

@ElFantasma ElFantasma commented Mar 26, 2026

Motivation

Profiling snap sync's insertion phases on mainnet revealed that trie building is 75-91% of insertion time, and it's entirely CPU-bound (0% I/O wait). Account insertion took ~20 minutes and storage insertion ~43 minutes — together 61% of total snap sync time.

Description

Three optimizations targeting the CPU-bound trie construction:

1. Eliminate redundant code hash iteration pass

The original code iterated all accounts twice — once to collect code hashes (async flush), once to build the trie. Merged both into a single pass by collecting code hashes in the trie-building iterator and flushing them incrementally to disk.

2. Reuse nodehash_buffer across calls

add_current_to_parent_and_write_queue allocated a new 512-byte Vec on every call (~700M calls on mainnet). Now receives the buffer as a parameter from the outer function. Also avoids a Nibbles path clone per call by computing the prefix index directly.

3. Parallel state trie building across 16 nibble ranges

Split the state trie build into 16 independent sub-tries (one per first nibble of account hash), built concurrently using streaming RocksDB iterators and a shared thread pool for DB flushes. Sub-trie results are assembled into the root BranchNode via assemble_root_from_subtries. Each sub-trie thread also collects its own code hashes (flushed incrementally to disk) and storage accounts, merged after completion.

Code structure

  • trie_from_sorted_subtrie — builds a single sub-trie, returns the child NodeRef
  • assemble_root_from_subtries — assembles 16 sub-trie results into a root node (shared between trie_from_sorted_parallel and insert_accounts)
  • trie_from_sorted_parallel — clean API used by tests; production uses the lower-level functions directly to collect side effects via NibbleIterator
  • NibbleIterator — RocksDB iterator filtered to a single nibble range, collects code hashes + storage accounts as side effects

Benchmarks (mainnet-8, release profile, no validation)

Account Insertion

Sub-phase Before After Delta
SST ingest 104s 112s
Code hash pass 196s 0s eliminated
Trie building 883s 700s -21%
Total 1184s (19m 44s) 818s (13m 40s) -31%

Storage Insertion

Sub-phase Before After Delta
Trie builds 2357s 2208s -6%
Total 2587s (43m 7s) 2433s (40m 30s) -6%

Multisync validation (ethrex-mainnet-9, release-with-debug-assertions)

Ran 4 complete multisync cycles on ethrex-mainnet-9 (12 cores, 64GB RAM) with release-with-debug-assertions profile, which enables full state trie validation (state root, storage roots, bytecodes) after snap sync. All 4 runs passed on all 3 networks.

Phase breakdown (run #2 — representative)

Hoodi (total: 45m 48s including validation)

Phase Duration Items
Block Headers 0:30 2.5M
Account Ranges 0:30 38.7M
Account Insertion 1:10 43.4M
Storage Ranges 2:00 + 3:10 291M
State Healing 0:10 270
Storage Insertion 8:20 268.6M
Storage Healing 13:30 10

Sepolia (total: 2h 14m including validation)

Phase Duration Items
Block Headers 1:50 10.2M
Account Ranges 1:00 143.9M
Account Insertion 3:40 153.2M
Storage Ranges 12:20 872M
State Healing 0:20 + 0:30 6.4K
Storage Insertion 35:50 871.9M
Storage Healing 48:20 89.9K

Mainnet (total: 3h 38m including validation)

Phase Duration Items
Block Headers 17:20 24.7M
Account Ranges 2:10 364M
Account Insertion 17:40 377.7M
Storage Ranges 16:33 1.55B
State Healing 3:20 + 2:10 27.3K
Storage Insertion 42:40 1.55B
Storage Healing 1:15:34 72.4K

Consistency across runs

Network Run 1 Run 2 Run 3 Run 4
Hoodi 35m 37s 45m 48s 36m 51s 37m 41s
Sepolia 2h 04m 2h 14m 2h 14m 2h 18m
Mainnet 3h 44m 3h 38m 3h 35m 3h 37m

All runs include full trie validation (state root + storage roots + bytecodes). Validation passed on every run.

Checklist

  • Tested on mainnet snap sync (3 networks × 4 runs — all successful with validation)
  • Unit tests for parallel trie correctness (5 tests comparing parallel vs sequential hashes)
  • Profiling instrumentation removed
  • Root assembly logic deduplicated into assemble_root_from_subtries
  • Thread join panics handled gracefully (ThreadJoinError instead of unwrap)

- Reuse nodehash_buffer across add_current_to_parent_and_write_queue calls
  instead of allocating a new 512-byte Vec on each of ~700M invocations
- Avoid Nibbles path clone by computing prefix index directly
- Merge code hash collection into the trie-building pass, eliminating a
  redundant full iteration + RLP decode of all accounts (~3 min on mainnet)
Split account insertion's trie build into 16 independent sub-tries
(one per first nibble of account hash), built concurrently using
a shared thread pool. Sub-trie results are assembled into the root
BranchNode after all threads complete.

Also partitions accounts in a single pass that collects code hashes
and storage roots, eliminating the redundant second iteration.
Replace the in-memory partitioning (which loaded all 374M accounts into
Vec buckets causing OOM on mainnet) with per-nibble RocksDB raw iterators
that stream from disk. Each of the 16 sub-trie threads creates its own
iterator, seeks to its nibble boundary, and streams accounts one at a time.

The scope and thread pool are now managed directly in insert_accounts,
matching the pattern used by insert_storages.
@ElFantasma ElFantasma force-pushed the perf/trie-building-optimizations branch from 163c612 to 37d33dc Compare April 7, 2026 17:58
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Lines of code report

Total lines added: 363
Total lines removed: 0
Total lines changed: 363

Detailed view
+-----------------------------------------------------+-------+------+
| File                                                | Lines | Diff |
+-----------------------------------------------------+-------+------+
| ethrex/crates/common/trie/trie_sorted.rs            | 707   | +275 |
+-----------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync/code_collector.rs | 79    | +6   |
+-----------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync/snap_sync.rs      | 1101  | +82  |
+-----------------------------------------------------+-------+------+

@ElFantasma ElFantasma requested a review from fedacking April 7, 2026 18:08
@ElFantasma ElFantasma marked this pull request as ready for review April 7, 2026 19:44
@ElFantasma ElFantasma requested a review from a team as a code owner April 7, 2026 19:44
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

🤖 Kimi Code Review

Here's my review of the PR:

Critical Issues

1. Ignored Flush Errors (Potential Data Loss)

File: crates/common/trie/trie_sorted.rs
Lines: 442, 487, 588

The trie_from_sorted_subtrie function ignores the result of flush_nodes_to_write:

let _ = flush_nodes_to_write(nodes_to_write, db, buffer_sender);

If this fails (DB write error or channel closed), the function returns Ok(child_ref) but the data may not be persisted, leading to a corrupted trie and consensus failure. Change to propagate the error:

flush_nodes_to_write(nodes_to_write, db, buffer_sender)?;

2. Panic on Channel Receive

File: crates/common/trie/trie_sorted.rs
Lines: 371, 398, 412, 429, 583

Multiple .expect("This channel shouldn't close") calls on channel receivers. If a worker thread panics, the sender drops and the main thread panics. Use map_err to convert to TrieGenerationError instead.

3. Iterator Panic on Decode Failure

File: crates/networking/p2p/sync/snap_sync.rs
Line: 1075

let account_state = AccountState::decode(&value_vec).expect("We should have accounts here");

This panics on malformed data. During snap sync, invalid data from peers should be handled gracefully as a sync error, not a panic.

Security & Correctness

4. Trie Root Consistency

The parallel construction logic appears correct:

  • Sub-tries are built independently with proper path isolation (first nibble)
  • Root assembly correctly handles the 3 cases (empty, single, multiple children)
  • Extension/leaf collapsing in assemble_root_from_subtries (lines 510-550) correctly prepends the nibble prefix

Recommendation: Ensure the test run_test_parallel includes edge cases like:

  • Empty sub-tries (some nibbles with no data)
  • Single account in a sub-trie
  • Single sub-trie with multiple accounts (tests the extension collapsing logic)

5. Atomic Ordering for File Index

File: crates/networking/p2p/sync/snap_sync.rs
Line: 1063

Ordering::Relaxed is technically sufficient for unique ID generation, but since these files are consumed later by the sync process, consider Ordering::SeqCst or document why relaxed is safe (no dependencies on ordering, only uniqueness).

Performance & Resource Management

6. Unnecessary Mutex Contention

File: crates/networking/p2p/sync/snap_sync.rs
Lines: 1142, 1180-1182

per_thread_storage_accounts uses a mutex in a tight loop (though only 16 threads). Since results are only needed after all threads join, return storage_accounts from the closure and collect from handles instead:

let handles: Vec<_> = (0u8..16).map(|nibble| {
    s.spawn(move || {
        // ... 
        Ok((nibble, child_ref, nibble_iter.storage


---
*Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt*

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

🤖 Codex Code Review

Findings

  1. High: the new RocksDB path can silently lose bytecode hashes and still report success. NibbleIterator::flush_code_hashes() discards the dump_to_file result in snap_sync.rs:1050, unlike CodeHashCollector, which propagates write failures via its task joins in code_collector.rs:67 and code_collector.rs:102. Because the file index is still advanced in snap_sync.rs:1218, a failed write permanently drops some code hashes from the later download phase, and the only post-check is a debug_assert! in snap_sync.rs:573. In release builds that means snap sync can finish with missing contract code. This should return an error out of the worker instead of swallowing it.

  2. Medium: this regresses bytecode-hash deduplication and can blow up I/O/work on mainnet. The old collector stores hashes in a HashSet in code_collector.rs:15 and inserts via code_collector.rs:36, but the new per-nibble path accumulates them in a Vec in snap_sync.rs:1034 and flushes every occurrence in snap_sync.rs:1081. Dedup now happens only much later when rereading every snapshot file in snap_sync.rs:506. For common shared bytecode patterns (proxies, clones, ERC20 copies), this can turn a “unique code hashes” workload back into an “all contract accounts” workload.

I did not find a clear trie-root correctness bug in the parallel root assembly from static inspection.

I couldn’t run the Rust tests in this sandbox because cargo/rustup need writable toolchain/cache dirs and dependency fetch access.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

🤖 Claude Code Review

Good — compute_hash_no_alloc calls buf.clear() at entry and exit, so buffer reuse is safe. Now let me write the full review.


Code Review: perf(l1): optimize trie building in snap sync insertion (#6410)

The motivation and approach are sound: profiling-driven, mathematically correct (account hashes are uniformly distributed across nibbles so sub-tries are ~independent), and validated against 4 full sync cycles. The performance improvements are real. Below are the issues worth addressing before merge.


Critical / Correctness

1. Final trie-node flush error is silently discarded

trie_sorted.rs, end of trie_from_sorted_subtrie:

let _ = flush_nodes_to_write(nodes_to_write, db, buffer_sender);

A write failure here loses trie nodes without propagating to the caller. The intermediate flushes inside the loop correctly propagate errors (via the thread pool return channel); only the final one is dropped. Consider returning the error or at minimum using tracing::error! to log it.

2. Code hash disk flush errors are silently discarded

snap_sync.rs, NibbleIterator::flush_code_hashes:

let _ = dump_to_file(&file_name, hashes.encode_to_vec());

A disk-full or permission error here silently drops code hashes. These are required to reconstruct bytecode during the sync. The error should propagate — this likely requires changing flush_code_hashes to return a Result and having next() return an error type, or panicking with a clear message.

3. expect on DB lookup in assemble_root_from_subtries should be a recoverable error

trie_sorted.rs, assemble_root_from_subtries (valid_count == 1 branch):

let child_rlp = db
    .get(child_path.clone())
    ...
    .expect("Sub-trie wrote this node");

If the sub-trie's final flush failed silently (see Point 1), this expect will panic with an unhelpful message during what should be a clean error path. Change to ok_or_else(|| TrieGenerationError::...).

4. NibbleIterator has no Drop impl — code hashes lost on error path

If trie_from_sorted_subtrie returns Err while iterating, the partially-consumed NibbleIterator is dropped. Any accumulated code_hashes that hadn't yet reached CODE_HASH_WRITE_BUFFER_SIZE are silently lost. Implementing Drop to call flush_code_hashes (and log any error) would prevent silent data loss on the retry path.


Maintainability / Design

5. Main trie-building loop is duplicated between trie_from_sorted_accounts and trie_from_sorted_subtrie

The inner while let Some(next_value) = next_value_opt loop body in trie_from_sorted_subtrie is essentially identical to the corresponding loop in the existing trie_from_sorted_accounts. The only difference is that trie_from_sorted_subtrie does not finalize the root. Consider extracting the shared loop into a helper that both functions call, to avoid a future bug requiring two fixes.

6. Parallel spawning logic duplicated between trie_from_sorted_parallel and insert_accounts

The PR description notes that trie_from_sorted_parallel is "used by tests; production uses lower-level functions directly." The consequence is that insert_accounts reimplements the 16-thread spawn, buffer allocation, and assemble_root_from_subtries call. If the algorithm needs a correctness fix, it has to be patched in two places. Consider extending trie_from_sorted_parallel's make_sub_iter closure to also capture the side-effect collection, which would allow production code to use it directly.

7. Misleading method name: next_file_index returns the current index

code_collector.rs:

pub fn next_file_index(&self) -> u64 {
    self.file_index
}

The name implies "returns the next index to use," but it returns the current value. A caller might skip an index by incrementing before calling. Rename to current_file_index or get_file_index.


Minor / Style

8. Thread panic message lost in ThreadJoinError

In both trie_from_sorted_parallel and insert_accounts:

handle.join().map_err(|_| TrieGenerationError::ThreadJoinError())??

The panic payload is discarded. Consider logging the panic message (downcasting to &str/String) before converting, as this would be the primary debugging signal if a sub-trie thread panics.

9. Test helper may be double-encoding account state

trie_sorted.rs, run_test_parallel:

.map(|(hash, state)| (hash, state.encode_to_vec()))

The existing generate_input_* functions return BTreeMap<H256, Vec<u8>>. If these Vec<u8> values are already RLP-encoded account states (as used by the real insert_accounts path), calling encode_to_vec() again wraps them in a second RLP layer. Both the sequential and parallel paths apply the same encoding, so the test would still pass while actually testing with malformed data. Verify that the test inputs are in the same format as the real data (raw encoded account bytes), not raw structs.

10. Thread count is computed twice with slightly different context

trie_sorted.rs, trie_from_sorted_parallel and snap_sync.rs, insert_accounts both call:

std::thread::available_parallelism().map(|n| n.get()).unwrap_or(8);

This is harmless but could be a one-liner helper to avoid the subtle divergence (e.g., someone changes the default in one place).


What looks good

  • The nodehash_buffer reuse is verified safe: all three compute_hash_no_alloc implementations call buf.clear() on entry, so there's no risk of residual data across calls.
  • Ordering::Relaxed for code_hash_file_index is correct: fetch_add is always atomic regardless of ordering, so file indices are unique; the ordering only affects surrounding memory visibility, which isn't relied upon here.
  • drop(snapshot) before drop(db) is correct ordering.
  • The seek/stop logic in NibbleIterator (start_key[0] = nibble << 4, stop when (key[0] >> 4) != self.nibble) is correct.
  • The five parallel tests properly compare against the sequential implementation as a reference oracle.
  • Mutex contention on per_thread_storage_accounts is minimal — one lock acquisition per thread at completion, not per-account.

Automated review by Claude (Anthropic) · sonnet · custom prompt

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 7, 2026

Greptile Summary

This PR optimizes snap sync trie building via three techniques: eliminating a redundant code-hash iteration pass, reusing a 512-byte hash buffer across ~700M calls, and parallelising the 16-nibble state trie build with streaming RocksDB iterators. The parallel correctness is well-validated (4 full mainnet runs with trie-root verification). Two minor robustness gaps remain: NibbleIterator::flush_code_hashes silently drops I/O errors that could cause missing bytecode downloads, and buffer_receiver.recv().expect(...) panics instead of returning a structured error if a pool worker fails.

Confidence Score: 4/5

Safe to merge — parallelism is correct and thoroughly validated; two non-blocking robustness improvements remain

Parallel trie construction verified across 4 complete mainnet multisync runs with full state-root and storage-root validation. The P2 findings (silent I/O error discard and expect-panic instead of structured error) do not affect correctness under normal operation but could make failures harder to diagnose. Score is 4 rather than 5 because the silent code-hash loss path has real user impact if triggered.

crates/networking/p2p/sync/snap_sync.rs (NibbleIterator::flush_code_hashes error handling), crates/common/trie/trie_sorted.rs (buffer channel expect calls)

Important Files Changed

Filename Overview
crates/common/trie/trie_sorted.rs Adds parallel sub-trie building across 16 nibble ranges (trie_from_sorted_subtrie + assemble_root_from_subtries), reuses a 512-byte hash buffer, and eliminates a Nibbles clone per node; core algorithm is correct but buffer-channel error paths use panicking expect calls
crates/networking/p2p/sync/snap_sync.rs Replaces sequential two-pass account insertion with 16-thread parallel NibbleIterator-based sub-trie build; code hash I/O errors are silently discarded in flush_code_hashes
crates/networking/p2p/sync/code_collector.rs Adds next_file_index/set_file_index methods to CodeHashCollector to coordinate file-index ownership with external parallel writers

Sequence Diagram

sequenceDiagram
    participant S as insert_accounts
    participant T as thread_scope
    participant N as NibbleIter[0..15]
    participant P as ThreadPool (DB writers)
    participant DB as TrieDB
    participant A as assemble_root_from_subtries

    S->>T: thread_scope { spawn 16 threads }
    par For each nibble 0..15
        T->>N: trie_from_sorted_subtrie(nibble)
        N->>DB: raw_iter.seek(nibble<<4)
        loop yield accounts
            N->>N: collect code_hashes + storage_accounts
            N->>P: execute_priority(flush_nodes_to_write)
            P->>DB: put_batch_no_alloc(nodes)
        end
        N->>N: flush_code_hashes() → dump_to_file
        N-->>T: Ok(Some(NodeRef[nibble]))
    end
    T->>T: join all 16 handles → choices[16]
    T->>T: drop pool → workers drain & exit
    T-->>S: [NodeRef; 16]
    S->>A: assemble_root_from_subtries(choices)
    A->>DB: get([nibble]) per valid choice
    A->>DB: put_batch(root node at [])
    A-->>S: state_root H256
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/networking/p2p/sync/snap_sync.rs
Line: 1050

Comment:
**Silent I/O error on code-hash flush**

Errors from `dump_to_file` are silently discarded with `let _ = ...`. If the disk write fails, the code hashes collected by this nibble's thread are permanently lost without any log or error surface — bytecodes for those accounts will never be downloaded.

```suggestion
            if let Err(e) = dump_to_file(&file_name, hashes.encode_to_vec()) {
                tracing::error!("Failed to write code hashes file (nibble {}): {:?}", self.nibble, e);
            }
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/common/trie/trie_sorted.rs
Line: 390-392

Comment:
**Panic instead of error on buffer-channel closure**

If a pool worker panics before returning a buffer via `sender.send(...)` in `flush_nodes_to_write`, all senders can drop and `recv().expect("This channel shouldn't close")` will panic the sub-trie thread rather than propagating a structured `Result`. The same pattern exists in `trie_from_sorted_accounts` around line 208. Using `recv().map_err(|_| TrieGenerationError::ThreadJoinError())` would keep error handling consistent with the rest of the function.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(l1): replace unwrap with expect for ..." | Re-trigger Greptile

let file_idx = self.code_hash_file_index.fetch_add(1, Ordering::Relaxed);
let file_name = get_code_hashes_snapshot_file(self.code_hashes_dir, file_idx);
let hashes: Vec<H256> = std::mem::take(&mut self.code_hashes);
let _ = dump_to_file(&file_name, hashes.encode_to_vec());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Silent I/O error on code-hash flush

Errors from dump_to_file are silently discarded with let _ = .... If the disk write fails, the code hashes collected by this nibble's thread are permanently lost without any log or error surface — bytecodes for those accounts will never be downloaded.

Suggested change
let _ = dump_to_file(&file_name, hashes.encode_to_vec());
if let Err(e) = dump_to_file(&file_name, hashes.encode_to_vec()) {
tracing::error!("Failed to write code hashes file (nibble {}): {:?}", self.nibble, e);
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/p2p/sync/snap_sync.rs
Line: 1050

Comment:
**Silent I/O error on code-hash flush**

Errors from `dump_to_file` are silently discarded with `let _ = ...`. If the disk write fails, the code hashes collected by this nibble's thread are permanently lost without any log or error surface — bytecodes for those accounts will never be downloaded.

```suggestion
            if let Err(e) = dump_to_file(&file_name, hashes.encode_to_vec()) {
                tracing::error!("Failed to write code hashes file (nibble {}): {:?}", self.nibble, e);
            }
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +390 to +392
nodes_to_write = buffer_receiver
.recv()
.expect("This channel shouldn't close");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Panic instead of error on buffer-channel closure

If a pool worker panics before returning a buffer via sender.send(...) in flush_nodes_to_write, all senders can drop and recv().expect("This channel shouldn't close") will panic the sub-trie thread rather than propagating a structured Result. The same pattern exists in trie_from_sorted_accounts around line 208. Using recv().map_err(|_| TrieGenerationError::ThreadJoinError()) would keep error handling consistent with the rest of the function.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/common/trie/trie_sorted.rs
Line: 390-392

Comment:
**Panic instead of error on buffer-channel closure**

If a pool worker panics before returning a buffer via `sender.send(...)` in `flush_nodes_to_write`, all senders can drop and `recv().expect("This channel shouldn't close")` will panic the sub-trie thread rather than propagating a structured `Result`. The same pattern exists in `trie_from_sorted_accounts` around line 208. Using `recv().map_err(|_| TrieGenerationError::ThreadJoinError())` would keep error handling consistent with the rest of the function.

How can I resolve this? If you propose a fix, please make it concise.

@ElFantasma
Copy link
Copy Markdown
Contributor Author

ElFantasma commented Apr 7, 2026

Review feedback addressed

Thanks for the thorough reviews. Here's what was addressed and what wasn't (with reasoning).

Addressed

Finding Raised by Fix
Silent dump_to_file error in NibbleIterator Greptile, Codex, Claude Added tracing::error! logging instead of let _
recv().expect() panics on channel close in trie_from_sorted_subtrie Greptile, Kimi Converted to .map_err(|_| TrieGenerationError::ThreadJoinError())?
Final flush_nodes_to_write error silently discarded Kimi, Claude Changed let _ = to ? to propagate
.expect("Sub-trie wrote this node") in assemble_root_from_subtries Claude Changed to .ok_or(TrieGenerationError::FlushToDbError(TrieError::InvalidInput))?
No Drop impl on NibbleIterator — code hashes lost on error paths Claude Added Drop impl that flushes code hashes and collects storage accounts
handle.join().unwrap() panics on thread failure Fedacking (original review) Converted to .map_err(|_| TrieGenerationError::ThreadJoinError())?
Mutex unwrap() rejected by clippy CI Changed to .expect("storage accounts mutex poisoned")
Parallel spawn duplication between trie_from_sorted_parallel and insert_accounts Claude, Codex Refactored insert_accounts to call trie_from_sorted_parallel directly. NibbleIterator now handles all side effects (code hash flushing + storage account collection) via Drop, so the generic API works for production. Tests now exercise the same code path as production. (-57 lines)

Not addressing (with reasoning)

  1. Loop duplication between trie_from_sorted_accounts and trie_from_sorted_subtrie (Claude) — The inner trie-building loop is identical, but extracting a shared helper would require refactoring the pre-existing trie_from_sorted_accounts function, which is out of scope for this PR. The subtrie variant differs in finalization (returns NodeRef vs root hash).

  2. Code hash dedup regression (Codex) — The old CodeHashCollector uses a HashSet; our NibbleIterator uses a Vec. Adding a per-thread HashSet would increase memory, which is what incremental flushing was designed to avoid. Dedup still happens at read time in download_bytecodes via seen_code_hashes: HashSet, so correctness is unaffected — we just write more files to disk.

  3. Thread panic payload lost in ThreadJoinError (Claude) — The Box<dyn Any> payload is hard to extract usefully and the panic backtrace is already printed to stderr by the runtime.

  4. Ordering::Relaxed on code_hash_file_index (Kimi) — Correct as-is. fetch_add with Relaxed guarantees atomic unique indices. No other data depends on ordering.

  5. AccountState::decode(...).expect(...) in NibbleIterator (Kimi) — Pre-existing pattern. Data comes from our own temp RocksDB (not untrusted peers). A decode failure here would indicate internal corruption.

  6. next_file_index naming (Claude) — Returns the next index to be used, which is the current counter value. Name is accurate from the caller's perspective.

@ElFantasma ElFantasma dismissed fedacking’s stale review April 13, 2026 21:21

Stale — all concerns addressed in the 'Review feedback addressed' comment and subsequent commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants