mpir/pmi: dynamic world_id for singleton initialisations#7750
Open
nmnobre wants to merge 5 commits intopmodels:mainfrom
Open
mpir/pmi: dynamic world_id for singleton initialisations#7750nmnobre wants to merge 5 commits intopmodels:mainfrom
nmnobre wants to merge 5 commits intopmodels:mainfrom
Conversation
hzhou
reviewed
Mar 19, 2026
| if (pmi_kvs_name) { | ||
| HASH_FNV(pmi_kvs_name, strlen(pmi_kvs_name), world_id); | ||
| if (strcmp(pmi_kvs_name, "singinit") == 0) { | ||
| world_id = getpid(); |
Contributor
There was a problem hiding this comment.
I think the solution is acceptable for now. Let's add a comment to explain the reason:
/* world_id is used in creating shared memory name. We need make sure
* it is unique in case of multiple instances of singleton-init */
Contributor
Author
There was a problem hiding this comment.
I've expanded on the existing docstring for world_id instead, so all notes are in the same place. :)
I've also fixed a number of typos in the docs.
Contributor
There was a problem hiding this comment.
Please consider making this solution compatible to PMIx, i.e., comparing pmi_kvs_name also to "0" to detect the PMIx singleton case here. :)
Contributor
Author
There was a problem hiding this comment.
Definitely, thanks!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The issue
Refs #7374.
Two unrelated, singleton applications can interfere with each other if MPICH is configured with shared memory support. This is because, in its current state, the
world_idfor any two singleton apps is the same, which meansMPL_initshm_open()will try to open the same shared memory object. This is especially troublesome because a user can influence another user's behaviour (this is how I found this after much digging).Consider a user who runs a singleton app and then: a) ctrl-c's their app; b) forgets
MPI_Finalize(); or c) launches a very long app. Then, that user effectively takes that shared memory location hostage, and a second user is deprived of running their singleton application (in cases a) and b) probably until the system is rebooted), getting onlyMPIDI_POSIX_comm_bootstrap(288).......: Out of memory.The suggested solution
At first, my patch was in
MPIDI_POSIX_comm_bootstrap()because that's closer to the issue's critical code portion. But, then, I thought maybe it makes more sense to just change theworld_idfrom the get-go, as it more closely mirrors the non-singleton approach with a dynamic id.This was quite hard to debug, but also quite fun, so I'd like to see this through, even if the current solution isn't satisfactory. :)
Cheers,
-Nuno