Skip to content

[Docs] Add user-guide page for atomics and bit operations#640

Draft
hughperkins wants to merge 2 commits intomainfrom
hp/docs-atomics
Draft

[Docs] Add user-guide page for atomics and bit operations#640
hughperkins wants to merge 2 commits intomainfrom
hp/docs-atomics

Conversation

@hughperkins
Copy link
Copy Markdown
Collaborator

Summary

New user-guide page docs/source/user_guide/atomics.md covering the per-thread / scalar tier of Quadrants primitives — atomic read-modify-write ops (qd.atomic_*) and bit-counting helpers (qd.math.popcnt, qd.math.clz).

Companion to the in-flight subgroup (#639) and block (#638) docs; this is one of a series of tier-by-tier user-guide pages. Independent of the others — no cross-links to unmerged docs.

Includes:

  • Backend / dtype support matrices for all 8 atomics and the two bit-count helpers, with caveats (f64 fast path, clz u32 / u64 + SPV 32-bit hard-cap).
  • Per-op semantics with the universal "returns the old value" / "per-location atomicity, not a fence on the rest of memory" framing.
  • Worked examples: slot reservation (workhorse pattern for select / compact), histogram, popcnt sum, MSB-via-clz.
  • Performance and portability notes — atomic contention, fence-pairing, f64 and atomic_mul cost, clz portability.

Adds atomics to the SIMT-primitives toctree in index.md.

Test plan

  • cd docs && make html builds without warnings (no broken cross-links — all unmerged-doc references are plain prose, not Markdown links).
  • Visual check rendered page on RTD preview.

Made with Cursor

Documents qd.atomic_* (add, sub, mul, min, max, and, or, xor) and the
bit-counting helpers qd.math.popcnt / qd.math.clz. Covers semantics
(returns old value, per-location atomicity, no fence on the rest of
memory), backend / dtype support, performance pitfalls (contention,
f64 fast path, atomic_mul CAS loop), and worked examples (slot
reservation, histogram, popcnt sum, MSB).

Adds atomics.md to the SIMT-primitives toctree in index.md.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 96eaa1904b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread docs/source/user_guide/atomics.md Outdated

### `qd.math.popcnt(x)`

Counts set bits in `x` and returns an `i32`. Lowers to `__popc` / `__popcll` on CUDA, `OpBitCount` on SPIR-V, `__builtin_amdgcn_popcnt` on AMDGPU. Defined for all integer dtypes.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop advertising AMDGPU popcnt support

When users run on qd.amdgpu, qd.math.popcnt is currently not lowered: tests/python/test_unary_ops.py::test_popcnt explicitly xfails AMDGPU with “codegen does not lower this op,” and quadrants/codegen/amdgpu/codegen_amdgpu.cpp::emit_extra_unary has no UnaryOpType::popcnt branch. This line (and the later “fully cross-backend” note) will send AMDGPU users to an API path that hits QD_NOT_IMPLEMENTED, so the docs should mark AMDGPU as unsupported until that lowering exists.

Useful? React with 👍 / 👎.

Comment thread docs/source/user_guide/atomics.md Outdated

### `qd.math.clz(x)`

Counts leading zero bits in `x` and returns an `i32`. For a 32-bit input, `clz(0) = 32`; otherwise the result is in `[0, 31]`. Lowers to `__nv_clz` / `__nv_clzll` on CUDA, `FindMSB` on SPIR-V (with `bitwidth - 1 - FindMSB` to convert MSB index into leading-zero count), `__builtin_amdgcn_sffbh_i32` on AMDGPU. See the cross-backend caveats in the support table.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop advertising AMDGPU clz support

For the same AMDGPU context, qd.math.clz is not implemented today: tests/python/test_unary_ops.py::test_clz xfails AMDGPU because the codegen does not lower it, and the AMDGPU unary lowering override does not handle UnaryOpType::clz. Documenting an AMDGPU builtin here makes the new user guide claim support for kernels that currently fail at codegen time.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

The previous single page covered two unrelated topics: the
qd.atomic_* read-modify-write ops, and the qd.math.popcnt /
qd.math.clz bit-counting helpers.  They share no concerns
(memory coordination vs single-thread register ops) and live
in different namespaces.

Split into:

- atomics.md  -- title shortened to "Atomics".  Drops the
  bit-counting table, the popcnt / clz semantics
  subsections, the bitset-popcount and Morton-MSB
  examples, and the bit-trick portability bullet.
  Adds a one-line cross-reference to math.md in the intro
  and the Related section.

- math.md  (new)  -- framed as a qd.math.* page that
  currently documents only the bit operations.  Lists the
  broader qd.math surface (vec / mat / GLSL ops / trig /
  rotations / float-classification / constants) in the
  intro per the project's own python/quadrants/math/
  __init__.py docstring, and notes explicitly that those
  are exported but not yet documented here.  Reuses the
  bit-counting table, semantics, examples, and portability
  notes verbatim from the previous atomics.md.

Toctree gains a "math" entry next to "atomics" under the
SIMT-primitives caption, preserving the existing placement
of atomics rather than re-categorising as part of this
split.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant