[Docs] Add user-guide page for atomics and bit operations#640
[Docs] Add user-guide page for atomics and bit operations#640hughperkins wants to merge 2 commits intomainfrom
Conversation
Documents qd.atomic_* (add, sub, mul, min, max, and, or, xor) and the bit-counting helpers qd.math.popcnt / qd.math.clz. Covers semantics (returns old value, per-location atomicity, no fence on the rest of memory), backend / dtype support, performance pitfalls (contention, f64 fast path, atomic_mul CAS loop), and worked examples (slot reservation, histogram, popcnt sum, MSB). Adds atomics.md to the SIMT-primitives toctree in index.md.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 96eaa1904b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| ### `qd.math.popcnt(x)` | ||
|
|
||
| Counts set bits in `x` and returns an `i32`. Lowers to `__popc` / `__popcll` on CUDA, `OpBitCount` on SPIR-V, `__builtin_amdgcn_popcnt` on AMDGPU. Defined for all integer dtypes. |
There was a problem hiding this comment.
Stop advertising AMDGPU popcnt support
When users run on qd.amdgpu, qd.math.popcnt is currently not lowered: tests/python/test_unary_ops.py::test_popcnt explicitly xfails AMDGPU with “codegen does not lower this op,” and quadrants/codegen/amdgpu/codegen_amdgpu.cpp::emit_extra_unary has no UnaryOpType::popcnt branch. This line (and the later “fully cross-backend” note) will send AMDGPU users to an API path that hits QD_NOT_IMPLEMENTED, so the docs should mark AMDGPU as unsupported until that lowering exists.
Useful? React with 👍 / 👎.
|
|
||
| ### `qd.math.clz(x)` | ||
|
|
||
| Counts leading zero bits in `x` and returns an `i32`. For a 32-bit input, `clz(0) = 32`; otherwise the result is in `[0, 31]`. Lowers to `__nv_clz` / `__nv_clzll` on CUDA, `FindMSB` on SPIR-V (with `bitwidth - 1 - FindMSB` to convert MSB index into leading-zero count), `__builtin_amdgcn_sffbh_i32` on AMDGPU. See the cross-backend caveats in the support table. |
There was a problem hiding this comment.
Stop advertising AMDGPU clz support
For the same AMDGPU context, qd.math.clz is not implemented today: tests/python/test_unary_ops.py::test_clz xfails AMDGPU because the codegen does not lower it, and the AMDGPU unary lowering override does not handle UnaryOpType::clz. Documenting an AMDGPU builtin here makes the new user guide claim support for kernels that currently fail at codegen time.
Useful? React with 👍 / 👎.
The previous single page covered two unrelated topics: the qd.atomic_* read-modify-write ops, and the qd.math.popcnt / qd.math.clz bit-counting helpers. They share no concerns (memory coordination vs single-thread register ops) and live in different namespaces. Split into: - atomics.md -- title shortened to "Atomics". Drops the bit-counting table, the popcnt / clz semantics subsections, the bitset-popcount and Morton-MSB examples, and the bit-trick portability bullet. Adds a one-line cross-reference to math.md in the intro and the Related section. - math.md (new) -- framed as a qd.math.* page that currently documents only the bit operations. Lists the broader qd.math surface (vec / mat / GLSL ops / trig / rotations / float-classification / constants) in the intro per the project's own python/quadrants/math/ __init__.py docstring, and notes explicitly that those are exported but not yet documented here. Reuses the bit-counting table, semantics, examples, and portability notes verbatim from the previous atomics.md. Toctree gains a "math" entry next to "atomics" under the SIMT-primitives caption, preserving the existing placement of atomics rather than re-categorising as part of this split.
Summary
New user-guide page
docs/source/user_guide/atomics.mdcovering the per-thread / scalar tier of Quadrants primitives — atomic read-modify-write ops (qd.atomic_*) and bit-counting helpers (qd.math.popcnt,qd.math.clz).Companion to the in-flight subgroup (#639) and block (#638) docs; this is one of a series of tier-by-tier user-guide pages. Independent of the others — no cross-links to unmerged docs.
Includes:
f64fast path,clzu32 / u64 + SPV 32-bit hard-cap).clz.f64andatomic_mulcost,clzportability.Adds
atomicsto the SIMT-primitives toctree inindex.md.Test plan
cd docs && make htmlbuilds without warnings (no broken cross-links — all unmerged-doc references are plain prose, not Markdown links).Made with Cursor