Skip to content

MDEV-21423 - lock-free trx_sys get performance regression cause by lf_find and ut_delay#5043

Open
svoj wants to merge 1 commit into
MariaDB:mainfrom
svoj:pr-main-MDEV-21423
Open

MDEV-21423 - lock-free trx_sys get performance regression cause by lf_find and ut_delay#5043
svoj wants to merge 1 commit into
MariaDB:mainfrom
svoj:pr-main-MDEV-21423

Conversation

@svoj
Copy link
Copy Markdown
Contributor

@svoj svoj commented May 5, 2026

TBD

@svoj svoj requested a review from dr-m May 5, 2026 22:10
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new rw_trx_ids_t class to manage read-write transaction IDs and serialization numbers, moving them from the hash elements into a centralized vector within the transaction system. This involves significant updates to transaction registration, deregistration, and snapshotting logic across InnoDB. Feedback highlights a critical thread-safety issue where the ids vector is accessed without a read lock during potential reallocations, which could lead to memory corruption. Additionally, the use of memset to initialize a synchronization primitive was identified as unsafe and should be removed in favor of the standard initialization call.

Comment thread storage/innobase/include/trx0sys.h
Comment thread storage/innobase/include/trx0sys.h Outdated
Comment thread storage/innobase/include/trx0sys.h Outdated
Comment thread storage/innobase/include/trx0sys.h Outdated
Comment thread storage/innobase/include/trx0sys.h Outdated
Comment thread storage/innobase/log/log0log.cc Outdated
@svoj svoj force-pushed the pr-main-MDEV-21423 branch from 3b998f0 to 6d2f280 Compare May 6, 2026 07:11
Comment thread storage/innobase/include/trx0sys.h Outdated
Comment thread storage/innobase/include/trx0sys.h
Comment thread storage/innobase/include/trx0sys.h
Comment thread storage/innobase/include/trx0sys.h
Comment thread storage/innobase/include/trx0sys.h
Comment thread storage/innobase/include/trx0sys.h Outdated
Comment thread storage/innobase/include/trx0sys.h
Comment thread storage/innobase/include/trx0trx.h Outdated
@svoj svoj force-pushed the pr-main-MDEV-21423 branch 2 times, most recently from 0de23ce to ac20be7 Compare May 6, 2026 14:21
Copy link
Copy Markdown
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very promising. The reason why I won’t give an approval yet is that I didn’t review all details of this thoroughly, especially around startup and shutdown.

This needs to be tested, both for performance and stability. Please coordinate with the testers on this.

Comment thread storage/innobase/include/trx0trx.h Outdated
Comment on lines +632 to +633
/** trx_sys.rw_trx_ids index, protected by mutex */
uint32_t rw_trx_ids_slot;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really protected by trx_t::mutex as the comment claims, or by trx_sys.rw_trx_ids.latch?

@mariadb-SaahilAlam
Copy link
Copy Markdown

Test run completed on  ac20be7
No issues were found during testing

…_find and ut_delay

Under high concurrency, MVCC snapshot creation may spend a significant
amount of time in lf_hash_iterate()/lfind() while collecting active
read-write transaction identifiers. This overhead is particularly
visible in sysbench oltp_read_write with
transaction-isolation=READ-COMMITTED.

Iteration cost becomes high due to significant TLB thrashing and poor
memory locality in this hot code path because snapshot creation touches
many rw_trx_hash nodes distributed across memory, including dummy
nodes that are irrelevant for snapshot construction. In addition,
traversing LF_HASH requires issuing heavyweight memory barriers.

This is a performance regression after 53cc9aa, which changed
MVCC snapshot creation to scan LF_HASH instead of maintaining a global
sorted vector protected by the global mutex.

Add trx_sys.rw_trx_ids, a compact traversal-friendly vector of active
read-write transaction identifiers and serialization numbers optimized
for MVCC snapshot creation, while rw_trx_hash remains responsible for
transaction lookup.

The vector may contain empty slots corresponding to idle or read-only
transactions that currently do not own a read-write transaction
identifier. Such slots are skipped by snapshot creation.

This reduces traversal overhead during MVCC snapshot creation by
improving memory locality, reducing TLB pressure, and avoiding repeated
memory barriers required for rw_trx_hash traversal.
@svoj svoj force-pushed the pr-main-MDEV-21423 branch from ac20be7 to d756a03 Compare May 12, 2026 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants