Parallel input fetcher + no seek compaction + Flush and compact after IBD by l0rinc · Pull Request #180 · l0rinc/bitcoin

l0rinc · 2026-05-25T10:21:22Z

No description provided.

Introduce CoinsViewOverlay::StartFetching, which maps all input prevouts of a block to a new m_inputs vector of InputToFetch elements. Returns a ResetGuard which is lifetime bound to the block, while the InputToFetch elements are lifetime bound to the block as well. Introduce StopFetching to clear the m_inputs vector. CCoinsViewCache::Reset is made virtual and is overridden in CoinsViewOverlay. StopFetching is called on Reset, so the InputToFetch objects will not exceed the lifetime of the block. Introduce ProcessInput to fetch the utxo of an individual input in m_inputs. Each caller fetches the input at m_input_head and increments it, so each call will fetch the next input in the queue. Fetch coins from the m_inputs vector in FetchCoinFromBase by scanning all inputs until we discover the input with the correct outpoint. This is designed deliberately so multiple threads can call ProcessInput independently. Co-authored-by: l0rinc <pap.lorinc@gmail.com> Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>

Inputs spending outputs of an earlier transaction in the same block won't be in the cache or the db. They also won't be requested by FetchCoinFromBase, so we can filter them out to not waste time trying to fetch them. Build an unordered set of seen txids while flattening m_inputs and skip any prevout whose hash is already in the set. Co-authored-by: l0rinc <pap.lorinc@gmail.com>

Provides a worst-case upper bound on the number of inputs that can fit in a block, so callers (e.g. parallel input prefetching) can pre-allocate stable storage and rule out reallocation of per-input state. Cherry-picked from PR bitcoin#9938 (Lock-Free CheckQueue), with MAX_TXINS_PER_BLOCK renamed to MAX_INPUTS_PER_BLOCK to match the call site. Co-authored-by: Jeremy Rubin <jeremy.l.rubin@gmail.com>

Prepares for ProcessInput to be called from multiple threads. This flag acts as a memory fence around InputToFetch::coin. There is no lock guarding reads and writes of the coin field. Instead we use the flag's release/acquire semantics to ensure that when the main thread reads the coin it will have happened after a worker thread has finished writing it. Co-authored-by: l0rinc <pap.lorinc@gmail.com>

Prepares for ProcessInput to be called from multiple threads. ProcessInput reads from base. For ProcessInput to be safe to call in parallel on separate threads, it must not be mutated. Flush, Sync, and SetBackend can modify base, so we override these and StopFetching before calling the base class. Co-authored-by: l0rinc <pap.lorinc@gmail.com>

Add a configuration option for the number of worker threads used for parallel UTXO input fetching during block connection. Default is 4 threads, max is 16, 0 disables parallel fetching.

Prepares for ProcessInput to be called from multiple threads. Introduce a ThreadPool shared pointer to CoinsViewOverlay. A pool managed externally can be passed in the constructor. A global thread pool is used in fuzz harnesses since iterations can happen faster than the OS can create and tear down thread pools. This can cause a memory leak when fuzzing. Co-authored-by: l0rinc <pap.lorinc@gmail.com>

Leverages the thread pool to fetch inputs on multiple threads, while the overlay serves inputs on the main thread. This is a performance improvement over blocking the main thread to fetch inputs. Co-authored-by: l0rinc <pap.lorinc@gmail.com>

Co-authored-by: l0rinc <pap.lorinc@gmail.com>

Co-authored-by: l0rinc <pap.lorinc@gmail.com> Co-authored-by: sedited <seb.kung@gmail.com>

…reads

Log the chainstate LevelDB per-level file counts and stats table immediately before closing the database. This leaves a final shutdown snapshot in debug.log for reindex-chainstate benchmark comparisons.

Seek compaction is causing a cascade effect in the chainstate DB, causing large parts of the database to be rewritten every ~hour. Every periodic flush writes around 2 MiB. Since this is roughly the `write_buffer_size`, these writes regularly cause the memtable to rotate into a small L0 file. This file has a small seek budget, and with the random UTXO reads done during validation, it can get scheduled for seek compaction quickly. That seek compaction pushes the small file down to L1. Since most UTXOs are already lower down in L4/L5, many reads that consult this file do not find the key there and continue downward. The bloom filter makes those misses cheap, but LevelDB still decrements the file's seek budget. The file then gets scheduled for another seek compaction, and the same pattern pushes it down through L2 and L3. The expensive part happens around L3/L4. L4 has many ~32 MiB files holding the bulk of the UTXO set. When LevelDB compacts into L3, it may split the output into many smaller L3 files to limit how much L4 "grandparent" data any one output overlaps. Each of these small L3 files then gets its own small seek budget. Because chainstate keys are hash-random, each small L3 file can still have a broad key range, so many random reads consult it and quickly drain its budget. Once seek-compacted into L4, each tiny L3 file can overlap many L4 files, so compacting a few hundred KiB from L3 can require rewriting hundreds of MiB from L4. Repeating that across many small L3 files can rewrite most of the chainstate. This is a poor fit for chainstate because UTXO keys are hash-random, the DB is large enough to have many levels, writes are relatively small and periodic, and reads are frequent. The result is that read misses trigger compactions much earlier than size pressure would, and those compactions have very high write amplification. Disabling seek compaction may leave more files in upper levels for longer, so reads could theoretically consult more files. But Bitcoin Core enables bloom filters for all its LevelDB instances, so these misses are usually cheap in-memory filter checks rather than disk reads. For the other DBs, the risk is much smaller. They also use bloom filters, and most are smaller and less read-heavy. With fewer levels and less random read pressure, disabling seek compaction should have little effect there.

`SaltedOutpointHasher` has been `noexcept` since bitcoin#16957, which makes libstdc++ omit cached hash codes from `std::unordered_map` nodes. That saves one `size_t` per node, but it also makes `CCoinsMap` recompute the outpoint hash in table operations that could otherwise reuse the cached code. Declare the operator `noexcept(false)` to restore libstdc++ hash-code caching for this fast hash functor. The implementation still does not throw; only the exception specification used by the container policy changes. This restores a value that bitcoin#16957 deliberately removed, but the pool-backed `CCoinsMap` allocation budget still accounts for implementations storing hash values. The existing `PoolAllocator` comment says that "in some cases the hash value is stored as well" and that "sizeof(void*)*4" overhead "should thus be sufficient so that all implementations can allocate the nodes from the PoolAllocator." Add a unit test for the exception-specification contract and the libstdc++ fast-hash classification. Revert "make SaltedOutpointHasher noexcept" This reverts commit 67d9990.

andrewtoth and others added 17 commits May 23, 2026 22:31

validation: add -inputfetchthreads configuration option

c7af747

Add a configuration option for the number of worker threads used for parallel UTXO input fetching during block connection. Default is 4 threads, max is 16, 0 disables parallel fetching.

coins: fetch inputs in parallel

572f291

Leverages the thread pool to fetch inputs on multiple threads, while the overlay serves inputs on the main thread. This is a performance improvement over blocking the main thread to fetch inputs. Co-authored-by: l0rinc <pap.lorinc@gmail.com>

doc: update CoinsViewOverlay docstring to describe parallel fetching

3fdd978

Co-authored-by: l0rinc <pap.lorinc@gmail.com>

test: add unit tests for CoinsViewOverlay::StartFetching

4691620

Co-authored-by: l0rinc <pap.lorinc@gmail.com>

fuzz: update harnesses to cover CoinsViewOverlay::StartFetching

dcfa631

Co-authored-by: l0rinc <pap.lorinc@gmail.com> Co-authored-by: sedited <seb.kung@gmail.com>

fuzz: add coins_view_stacked fuzz harness to test concurrent leveldb …

580f49c

…reads

db: log chainstate LevelDB stats

f5a15c3

Log the chainstate LevelDB per-level file counts and stats table immediately before closing the database. This leaves a final shutdown snapshot in debug.log for reindex-chainstate benchmark comparisons.

crypto: use jumboblock SipHash for outpoints

ac805c2

Optimize input duplicate checks and nullness

a93d678

l0rinc force-pushed the detached546 branch from eab74f2 to a93d678 Compare May 25, 2026 20:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel input fetcher + no seek compaction + Flush and compact after IBD#180

Parallel input fetcher + no seek compaction + Flush and compact after IBD#180
l0rinc wants to merge 17 commits into
masterfrom
detached546

l0rinc commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

l0rinc commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants