Skip to content

5.0.1rc1#7754

Open
raffenet wants to merge 23 commits intopmodels:5.0.xfrom
raffenet:5.0.1rc1
Open

5.0.1rc1#7754
raffenet wants to merge 23 commits intopmodels:5.0.xfrom
raffenet:5.0.1rc1

Conversation

@raffenet
Copy link
Copy Markdown
Contributor

@raffenet raffenet commented Mar 27, 2026

Pull Request Description

A batch of bug fixes and improvements intended for a 5.0.1 release.

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

nmorey and others added 20 commits March 27, 2026 10:03
…her_release

The number of received bytes in release_gather_release is badly cast between
int and MPI_Aint. On most arch this is not an issue, but for Big-Endian 64b arch (s390x)
it ends up losing the actual value as we only copy the first 4 MSB.
Fix the issue by writing the whole MPI_AInt in the shm_buf instead of just an int.

Signed-off-by: Nicolas Morey <nmorey@suse.com>
In anysrc receive, we create both a NM request and a SHM request. The
SHM request is the user visible one. When the NM matches (or in
progress), we will cancel the SHM partner. But since the SHM request is
user-visible, we can't touch the cancel bits. Instead, we use another
mechanism to mark the SHM request is cancelled (by resetting its
anysrc_partner fields).
Don't assume the complex types are always available in CXX.
When we changed the signature of the launch_procs files, we neglected to
update the header pbs.h.
In the multinic case, provider may provide an invalid "null" pci info,
which will result in hwloc failing to obtain topology. Rather than
dealing this invalid case in the topology code, let's guard this case
and deal with it in the higher layer. In the case of ofi multi-nic, we
will simply treat all nics are equally close and equally distribute them
among the ranks.
mpipr.h is used to replace MPI calls with PMPI calls inside romio to
prevent spurious hits in the profiling tools. This patch adds the
large count _c APIs that we previously neglected.

Add additional missed functions including MPI_Type_free_keyval,
MPI_Comm_get_info, MPI_Type_create_hindexed etc.

Also add MPIX_Type_iov and MPIX_Type_iov_len.

Co-Authored-by: Lisandro Dalcin <dalcinl@gmail.com>
Some MPI File APIs are implemented outside romio.
Avoid internally call MPI_ functions and replace them with internal
impl functions.
The device_state->subdevices buffer allocated during initialization
was not being freed in finalize_hook, causing a memory leak detected
by AddressSanitizer. Fixes pmodels#7724.
When we successfully get topology information from PMIx, we should
destroy it at finalize time to avoid memory leaks.
Since now we generate bindings for MPI-IO functions directly, we should
no longer need the hack in all_romio_symbols to force pulling romio
funcitions (such as MPI_File_open) into libmpi.

The inclusion of all_romio_symbols.c ends up with libpmpi.so referring
to MPI_File_xxx symbols in the case when profiling libraries are built.
Do not neglect the error returns from RNDV callbacks. If it is not
appropriate to return error during progress, the RNDV callbacks should
take measure to return MPI_SUCCESS instead.
Missing error check in MPIDI_Reduce_intra_composition_alpha causing
errors in MPIC_Recv undetected.

The issue is triggered by a source to device intranode send going to the
CMA path.
MPIDI_IPCI_TYPE__SKIP refers to the case where the buffer resides on a
device but GPU IPC is unavailable. Since it is a device buffer, XPMEM
and CMA cannot be used either. This logic applies to both the send side
and the receive side when the receiver must decide whether to use IPC
write. Return MPIDI_IPCI_TYPE__SKIP rather than MPIDI_IPCI_TYPE__NONE
so the receive side can make the correct decision.
The CMA path calls the kernel for remote copy but it cannot handle the
device buffer. This is tricky if the recv buffer is in device but can't
fallback to GPU IPC write, for example due to non-contig messages.
Allocate a host bounce buffer to make it work.
In commit 214c6bc we switched to using the dev.anysrc_partner field to
flag whether a partner request has been cancelled. However, there is a
gap in anysource_irecv during MPIDI_NM_mpi_irecv, it only sets
anysrc_partner for the netmod request. MPIDI_NM_mpi_irecv may match and
complete immediately but it may miss the cancelling of the shm partner
due to its anysrc_partner not set yet.

To fix this, we need make sure anysrc_partner for both the shm and nm
requests are set at the same time. Move the macro
MPIDI_REQUEST_SET_LOCAL to mpidpost.h and convert it into an inline
function.
Fortunately this is caught by the ubsan test large_type_sendrecv:
    ofi_rndv_read.c:348:18: runtime error: load of value 182, which is
not a valid value for type '_Bool'
@raffenet
Copy link
Copy Markdown
Contributor Author

test:mpich/ch4/most
test:mpich/ch3/most

hzhou and others added 3 commits March 27, 2026 13:54
The dynamic_sendrecv is used in MPI_Intercomm_create. The mismatching
between threads are protected by the user provided tag, thus it is okay
to yield during the blocking progress. Without the yield,
MPI_Intercomm_create may block another thread's progress when the remote
processes are not present (blocked by other communications).

In the dynamic process accept/connect path, we force peer_comm's context
id to 0.  This is okay because the leader exchange is established with a
specific pair of addresses and there is no other communications yet
during leader_exchange.
The context_id are not reflected in get_dynamic_match_bits because
MPIDI_UCX_DYNPROC_MASK masked it off. Step back, we can't safely rely on
MPIDI_UCX_DYNPROC_MASK since we didn't set a protocol bit for dynamic
exchanges. This commit defines MPIDI_UCX_DYNPROC and use it to separate
dynamic exchanges from other messages.
@raffenet
Copy link
Copy Markdown
Contributor Author

test:mpich/ch4/most
test:mpich/ch3/most

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants