MDEV-21423 - lock-free trx_sys get performance regression cause by lf_find and ut_delay#5043
MDEV-21423 - lock-free trx_sys get performance regression cause by lf_find and ut_delay#5043svoj wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new rw_trx_ids_t class to manage read-write transaction IDs and serialization numbers, moving them from the hash elements into a centralized vector within the transaction system. This involves significant updates to transaction registration, deregistration, and snapshotting logic across InnoDB. Feedback highlights a critical thread-safety issue where the ids vector is accessed without a read lock during potential reallocations, which could lead to memory corruption. Additionally, the use of memset to initialize a synchronization primitive was identified as unsafe and should be removed in favor of the standard initialization call.
0de23ce to
ac20be7
Compare
dr-m
left a comment
There was a problem hiding this comment.
This looks very promising. The reason why I won’t give an approval yet is that I didn’t review all details of this thoroughly, especially around startup and shutdown.
This needs to be tested, both for performance and stability. Please coordinate with the testers on this.
| /** trx_sys.rw_trx_ids index, protected by mutex */ | ||
| uint32_t rw_trx_ids_slot; |
There was a problem hiding this comment.
Is it really protected by trx_t::mutex as the comment claims, or by trx_sys.rw_trx_ids.latch?
|
Test run completed on ac20be7 |
…_find and ut_delay Under high concurrency, MVCC snapshot creation may spend a significant amount of time in lf_hash_iterate()/lfind() while collecting active read-write transaction identifiers. This overhead is particularly visible in sysbench oltp_read_write with transaction-isolation=READ-COMMITTED. Iteration cost becomes high due to significant TLB thrashing and poor memory locality in this hot code path because snapshot creation touches many rw_trx_hash nodes distributed across memory, including dummy nodes that are irrelevant for snapshot construction. In addition, traversing LF_HASH requires issuing heavyweight memory barriers. This is a performance regression after 53cc9aa, which changed MVCC snapshot creation to scan LF_HASH instead of maintaining a global sorted vector protected by the global mutex. Add trx_sys.rw_trx_ids, a compact traversal-friendly vector of active read-write transaction identifiers and serialization numbers optimized for MVCC snapshot creation, while rw_trx_hash remains responsible for transaction lookup. The vector may contain empty slots corresponding to idle or read-only transactions that currently do not own a read-write transaction identifier. Such slots are skipped by snapshot creation. This reduces traversal overhead during MVCC snapshot creation by improving memory locality, reducing TLB pressure, and avoiding repeated memory barriers required for rw_trx_hash traversal.
TBD