in boost 1.91 i spot new asio atomic_slim_mutex implementation and it raised me questions about performance.
Are there any performance tests for slim mutex for gcc compiler ?
I initially thought that std::atomic (and other 4bytes types ) are specializatons with linux futex impl.
There is redhat document about atomics internal in gcc
https://developers.redhat.com/articles/2022/12/06/implementing-c20-atomic-waiting-libstdc#how_can_we_implement_atomic_waiting_
and it shows that there is "optimization" inside atomic regarding not to call internal futex via notify_one if there are no waiters.
Tricky part here is that wait on atomic updates waiter count and notify_one also read that count (but can be in collision with some other thread waiting on other futex, but in same bucket). I really would not expect that in libstdc++ , and that seems to affects performance of atomic_slim_mutex. (at least it should not be worse than pthread normal mutex, which does exactly same as atomic slim mutex but with futex directly instead calling wait and notify_one on atomics)
I made some (not so great) tests just with replacing wait and notify_one with futex and performance improves much. It is on par with pthread normal mutex , even better because of better inline in fast path.
what i change is smth like:
std::atomic<int> state_;
static void futex_wait(int* addr, int val) noexcept {
auto e = syscall(SYS_futex, addr, FUTEX_WAIT_PRIVATE, val, nullptr, nullptr, 0);
if (!e || errno == EAGAIN)
return;
if (errno != EINTR)
std::__throw_system_error(errno);
}
static void futex_wake(int* addr) noexcept {
syscall(SYS_futex, addr,FUTEX_WAKE_PRIVATE, 1, nullptr, nullptr, 0);
}
int* futex_addr() noexcept {
return reinterpret_cast<int*>(&state_);
}
replace
state_.wait(2, std::memory_order_relaxed);
with
futex_wait(futex_addr(), 2);
and
state_.notify_one();
with
futex_wake(futex_addr());
also during investigation of this atomic issue, I spot a bugfix in gcc regarding to not using FUTEX_PRIVATE flag
https://gcc.gnu.org/pipermail/libstdc++/2025-November/064598.html
I do not know if it will be backported to older gcc , but one more reason (not big one) to implement futex version.
in boost 1.91 i spot new asio atomic_slim_mutex implementation and it raised me questions about performance.
Are there any performance tests for slim mutex for gcc compiler ?
I initially thought that std::atomic (and other 4bytes types ) are specializatons with linux futex impl.
There is redhat document about atomics internal in gcc
https://developers.redhat.com/articles/2022/12/06/implementing-c20-atomic-waiting-libstdc#how_can_we_implement_atomic_waiting_
and it shows that there is "optimization" inside atomic regarding not to call internal futex via notify_one if there are no waiters.
Tricky part here is that wait on atomic updates waiter count and notify_one also read that count (but can be in collision with some other thread waiting on other futex, but in same bucket). I really would not expect that in libstdc++ , and that seems to affects performance of atomic_slim_mutex. (at least it should not be worse than pthread normal mutex, which does exactly same as atomic slim mutex but with futex directly instead calling wait and notify_one on atomics)
I made some (not so great) tests just with replacing wait and notify_one with futex and performance improves much. It is on par with pthread normal mutex , even better because of better inline in fast path.
what i change is smth like:
replace
state_.wait(2, std::memory_order_relaxed);
with
futex_wait(futex_addr(), 2);
and
state_.notify_one();
with
futex_wake(futex_addr());
also during investigation of this atomic issue, I spot a bugfix in gcc regarding to not using FUTEX_PRIVATE flag
https://gcc.gnu.org/pipermail/libstdc++/2025-November/064598.html
I do not know if it will be backported to older gcc , but one more reason (not big one) to implement futex version.