Skip to content

WIP core perf optimizations#5236

Draft
dmkozh wants to merge 103 commits intostellar:masterfrom
dmkozh:budget_test2
Draft

WIP core perf optimizations#5236
dmkozh wants to merge 103 commits intostellar:masterfrom
dmkozh:budget_test2

Conversation

@dmkozh
Copy link
Copy Markdown
Contributor

@dmkozh dmkozh commented Apr 17, 2026

No description provided.

dmkozh and others added 30 commits April 10, 2026 12:54
Instead of the main thread waiting idle while worker threads process
all clusters, have the main thread process cluster 0 directly. This
improves CPU utilization by eliminating idle time on the main thread.
Track which keys existed in the LedgerTxn before parallel apply via
mOriginalLedgerTxnKeys. Use this to call createWithoutLoading() or
updateWithoutLoading() instead of expensive load() calls during commit.

Also clone snapshots from GlobalParallelApplyLedgerState instead of
re-acquiring from the snapshot manager, ensuring consistency.

# Conflicts:
#	src/transactions/ParallelApplyUtils.cpp
Replace xdrSha256(success) with streaming SHA256 calculation to avoid
XDR re-serialization of InvokeHostFunctionSuccessPreImage. The return
value and events are already available as XDR-encoded bytes, so we can
hash them directly without round-trip serialization.
…nConfig

Allows callers with a pre-fetched SorobanNetworkConfig to pass it directly,
avoiding redundant config lookups during validation. The original overload
now delegates to the new one after fetching the config.

# Conflicts:
#	src/transactions/TransactionFrame.cpp
# Conflicts:
#	src/ledger/LedgerManagerImpl.cpp
Adds parallel processing to transaction set handling:

1. Parallel TxFrame creation: Creates TxFrames from XDR envelopes in
   parallel during transaction set deserialization. Uses work-stealing
   via std::async with even distribution across available threads.

2. Parallel transaction validation: Validates transactions in parallel
   in txsAreValid() when there are 2+ transactions.

3. Hash precomputation: Precomputes content and full hashes before
   parallel operations to avoid race conditions.

4. Test coverage: Adds StreamingShaTest for InvokeHostFunctionSuccessPreImage
   verification.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

# Conflicts:
#	src/herder/TxSetFrame.cpp
Add sizeBytes field to ContractDataMapEntryT to cache the XDR serialized
size of ledger entries. This avoids repeated xdr_size() calls during
state updates, reducing CPU overhead in the hot path.

Also adds Tracy zone to updateState() for profiling visibility.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
During ledger close, three independent operations are now parallelized:
- addHotArchiveBatch (modifies mHotArchiveBucketList)
- addLiveBatch (modifies mLiveBucketList) - runs on main thread
- updateInMemorySorobanState (modifies mInMemorySorobanState)

These operations modify completely independent data structures and can
safely run concurrently. Added getInMemorySorobanStateForUpdate() to
allow direct access to mInMemorySorobanState during COMMITTING phase.

This reduces ledger close latency by overlapping CPU-bound operations.

# Conflicts:
#	src/ledger/LedgerManagerImpl.cpp
That's because it doesn't properly commit changes and we can't share a snapshot across threads. There must be a better way around this, though preferably we should just fix the tests to not use in-memory mode at all.
dmkozh and others added 30 commits April 17, 2026 12:15
…t rid of virtual dispatch"""

This reverts commit 5f9634b.
…ion + get rid of virtual dispatch""""

This reverts commit 338e585.
resolveBackgroundEvictionScan previously received an UnorderedSet<LedgerKey>
built by getAllKeysWithoutSealing() containing ~128K entries (~20ms to build),
but only performed ~10-100 lookups. Added isModifiedKey() to LedgerTxn for
direct O(1) lookups in the existing EntryMap, eliminating the set construction.

resolveEviction zone: 20ms -> 0.116ms per ledger (99.4% reduction).
TPS: 18,944 -> 19,328 avg (+2.0%).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace single global mutex + RandomEvictionCache with 16 sharded caches,
each with its own mutex. This eliminates contention when 4 parallel threads
verify signatures simultaneously. Also use maybeGet() instead of exists()+get()
double-lookup, fix ZoneText string heap allocations, make counters atomic,
and remove unused liveSnapshot copy in applySorobanStageClustersInParallel.
Sort lightweight 24-byte EntryRef structs (type tag + pointer) instead of
full BucketEntry objects (200-500 bytes) in convertToBucketEntry. Reduces
sort swap cost by ~12x and materializes final vector in one cache-friendly
sequential pass. Cuts convertToBucketEntry from 31.9ms to 25.4ms per ledger.

Benchmark: 13,760 -> 14,144 TPS (+384 TPS, +2.8%)
Skip building LedgerTxnDelta in setEffectsDeltaFromSuccessfulTx when
INVARIANT_CHECKS is empty. The delta is consumed exclusively by
checkOnOperationApply which iterates an empty list when no invariants
are configured. This eliminates ~285ms of shared_ptr allocations and
entry copies across 4 worker threads per ledger.

Benchmark: 12,736 -> 13,760 TPS (+1,024 TPS, +8.0%)
This reverts commit e3225f4.

The budget optimization now seems slightly positive, but that wasn't reproduced on AWS instance; in any case the impact is pretty low.
…ads to subtle bugs and it's not clear how to fix these cleanly. Probably some redesign is necessary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants