Skip to content

Use atomic flag for buffer memory-attribute initialization#2216

Merged
seberg merged 1 commit into
NVIDIA:mainfrom
seberg:ft-fixes-atomic
Jun 17, 2026
Merged

Use atomic flag for buffer memory-attribute initialization#2216
seberg merged 1 commit into
NVIDIA:mainfrom
seberg:ft-fixes-atomic

Conversation

@seberg

@seberg seberg commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Store and read Buffer._mem_attrs_inited via std::atomic using acquire/release semantics to avoid races during lazy pointer memory- attribute initialization under free-threaded execution.

At least strictly speaking, _mem_attrs_inited is a flag that guards access to other memory attributes.
My understanding is that on ARM processors with weak memory ordering this can be a problem even beyond compiler optimizations making use of the fact that order isn't strictly guaranteed (without the light use of atomics).

I.e.:
thread1 -> init _mem_attrs
thread1 -> sets _mem_attrs_inited
thread2 -> reads _mem_attrs_inited
thread2 -> reads _mem_attrs

Does not guarantee that thread2 doesn't read the uninitialized _mem_attrs

This is probably not sufficiently covered by pytest-run-parallel, so if desired I/we should add an explicitly threaded test for it. (Although I have doubts we'll be able to fail the test even without the changes)

Store and read Buffer._mem_attrs_inited via std::atomic using
acquire/release semantics to avoid races during lazy pointer memory-
attribute initialization under free-threaded execution.

Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@seberg seberg added the bug Something isn't working label Jun 15, 2026
@github-actions github-actions Bot added the cuda.core Everything related to the cuda.core module label Jun 15, 2026
@seberg seberg added the P0 High priority - Must do! label Jun 15, 2026
@seberg seberg self-assigned this Jun 15, 2026
@seberg seberg added this to the cuda.core v1.1.0 milestone Jun 15, 2026
@seberg

seberg commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 7c1c215

@github-actions

This comment has been minimized.

@leofang leofang left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change seems fine to me. Though I wonder if there'd be a significant perf hit. Originally Buffer was meant to be a thin wrapper over the raw pointer, but we've grown the complexity over time.

Might be a good idea checking with @danielfrg and see if the micro benchmarks can tell us the perf difference. I dunno if/where it runs TBH.

In any case, approving. Thanks, Sebastian!

@seberg seberg merged commit e4208ed into NVIDIA:main Jun 17, 2026
206 of 210 checks passed
@seberg seberg deleted the ft-fixes-atomic branch June 17, 2026 07:39
@seberg

seberg commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

if there'd be a significant perf hit

I doubt this'll even register, I think the critical sections may be measurable but still so fast that it shouldn't really matter.
(I don't think atomics have much of an impact on non-threaded code, and even if threaded this should scale well, I believe.)

@github-actions

This comment has been minimized.

2 similar comments
@github-actions

This comment has been minimized.

@github-actions

Copy link
Copy Markdown
Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cuda.core Everything related to the cuda.core module P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants