Skip to content

Add hierarchical all-to-all design documentation#7222

Closed
iemAnshuman wants to merge 1 commit intoTheHPXProject:masterfrom
iemAnshuman:docs/hierarchical-all-to-all-design
Closed

Add hierarchical all-to-all design documentation#7222
iemAnshuman wants to merge 1 commit intoTheHPXProject:masterfrom
iemAnshuman:docs/hierarchical-all-to-all-design

Conversation

@iemAnshuman
Copy link
Copy Markdown
Contributor

Proposed Changes

  • Update hierarchical all_reduce and all_gather internal generation arithmetic from (2k-1, 2k) to (3k-2, 3k) (all_reduce.hpp:459-460, all_gather.hpp:420-421). This reserves a uniform stride-three internal generation footprint per user-level call across all hierarchical collectives, leaving slot 3k-1 available for the upcoming three-phase hierarchical all_to_all without colliding on the shared communicators' generation namespace.
  • Fix the pre-existing bad_parameter strings in all_reduce.hpp:451 and all_gather.hpp:412 from "the 2k/2k+1 internal mapping" to "the 3k-2/3k internal mapping". The previous strings described arithmetic that did not match the actual code (which used 2k-1, 2k).
  • Add a cross-collective generation regression test that interleaves all_reduce and all_gather calls at consecutive user generations on a single shared hierarchical_communicator and asserts that all calls succeed under the new arithmetic. The four-call version that also exercises all_to_all will be added in a follow-up PR when hierarchical all_to_all lands.

Any background context you want to provide?

This is the first prerequisite PR for hierarchical all_to_all, per the direction in discussion #7200. Hierarchical all_to_all is a three-phase decomposition (intra-subtree gather → inter-representative all_to_all → intra-subtree scatter) that consumes three internal generation slots per user-level call. Two-phase hierarchical collectives sharing the same hierarchical_communicator currently use stride-two arithmetic, which collides on the gate's generation counter once a stride-three call has run on the same communicator. The architectural decision recorded in #7200 is to use a uniform stride of three across all hierarchical collectives; two-phase ones use slots 3k-2 and 3k and skip 3k-1.

The skip carries no extra round trip and requires no internal communicator API changes. next_generation (detail/communicator.hpp:447) already accepts new_generation >= generation_ and post-increments, so issuing the last phase at internal generation 3k directly leaves the gate at 3k+1, ready for the next user-level call. Both two-phase and three-phase shapes leave the gate in the same state, so they can be freely interleaved on the same communicator across user generations.

The bad_parameter string fix is a small pre-existing documentation slip that this PR cleans up while in the same files. The strings already described arithmetic that did not match the actual code, so updating them to "3k-2/3k" leaves the source consistent with both the new code and the new design note.

The full design and the broader prerequisite PR sequence are written up in docs/gsoc-2026/hierarchical_all_to_all_design.md (9th covers the stride-three mechanism in detail; 14th step 0 scopes this PR).

Checklist

Not all points below apply to all pull requests.

  • I have added a new feature and have added tests to go along with it.
  • I have fixed a bug and have added a regression test.
  • I have added a test using random numbers; I have made sure it uses a seed, and that random numbers generated are valid inputs for the tests.

@iemAnshuman iemAnshuman requested a review from hkaiser as a code owner April 25, 2026 17:49
@StellarBot
Copy link
Copy Markdown
Collaborator

Can one of the admins verify this patch?

@codacy-production
Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@iemAnshuman
Copy link
Copy Markdown
Contributor Author

I updated the design note to reflect the decisions from this thread and to make the prerequisite PR sequence explicit. In particular, it records the stride-three generation plan, the shared top-level partition helper, the scoped gather/scatter helpers needed for top reps, and the padded Phase 2 block layout.
Living copy: https://github.com/iemAnshuman/hpx/blob/docs/hierarchical-all-to-all-design/docs/hierarchical_all_to_all_design.md
I will use this as the working reference for the stride-three and partition-helper PRs unless there are objections.

@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Apr 25, 2026

@iemAnshuman May I suggest to move this document to a github discussion item? I don't think we will ever merge this to the main repository after all.

@iemAnshuman
Copy link
Copy Markdown
Contributor Author

@iemAnshuman May I suggest to move this document to a github discussion item? I don't think we will ever merge this to the main repository after all.

Done. Added the write up to #7200. Closing this PR.

@hkaiser hkaiser added this to the 2.0.0 milestone Apr 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants