Skip to content

GTEST/UCP/FT: test FT in sequence of PUT/AM/FLUSH#11352

Open
evgeny-leksikov wants to merge 4 commits intoopenucx:masterfrom
evgeny-leksikov:gtest_ft_put_am_flush_seq
Open

GTEST/UCP/FT: test FT in sequence of PUT/AM/FLUSH#11352
evgeny-leksikov wants to merge 4 commits intoopenucx:masterfrom
evgeny-leksikov:gtest_ft_put_am_flush_seq

Conversation

@evgeny-leksikov
Copy link
Copy Markdown
Contributor

@evgeny-leksikov evgeny-leksikov commented Apr 16, 2026

What?

test UCP/EP/FT in sequence of PUT/AM/FLUSH

Why?

testing coverage

Note:

Depends on #11351

@evgeny-leksikov evgeny-leksikov force-pushed the gtest_ft_put_am_flush_seq branch from 4a25cdd to 2dff170 Compare April 17, 2026 08:53
@evgeny-leksikov evgeny-leksikov force-pushed the gtest_ft_put_am_flush_seq branch 2 times, most recently from bf0b4c8 to 4b99e51 Compare April 27, 2026 13:50
@evgeny-leksikov evgeny-leksikov marked this pull request as ready for review April 27, 2026 14:32
@openucx openucx deleted a comment from svc-nixl May 4, 2026
evgeny-leksikov and others added 4 commits May 5, 2026 21:35
The lane-change adjustment block in ucp_ep_flush_progress, introduced
when num_lanes was replaced with the all_lanes bitmap, had two issues:

1. req->send.flush.all_lanes was updated only via OR with new lanes, so
   destroyed-lane bits stayed in all_lanes. On a subsequent EP
   reconfiguration (e.g. wireup CM swap rdmacm->tcp) the same destroyed
   lane was detected again and uct_comp.count was decremented a second
   time, eventually tripping the count >= 0 assertion when remaining
   lane flushes completed.

2. ep_destroyed_lanes was not masked with ~started_lanes, so a started
   lane that subsequently disappeared would also be subtracted from
   uct_comp.count even though it was already accounted for - either by
   the synchronous-OK decrement, by ucp_ep_flush_error, or by the
   pending uct completion delivered via the discard flow.

Mask ep_destroyed_lanes with ~started_lanes and resync all_lanes to
ep_live_lanes after each adjustment, restoring the invariant the
previous num_lanes-based code maintained.

Co-authored-by: Cursor <cursoragent@cursor.com>
(cherry picked from commit 975e15b)
… superset

The previous loop condition `started_lanes != ep_live_lanes` is true
whenever the two bitmaps differ, including the case where started_lanes
is a strict superset of ep_live_lanes (a lane was started and then
destroyed). In that case `ep_live_lanes & ~started_lanes` is 0, and the
ucs_ffs64(0) call inside the loop body invokes UB on x86_64 (bsfq with
a zero input), producing a garbage lane index that subsequently feeds
ucp_ep_get_lane() out of bounds.

Cache the unstarted-live-lane bitmap into next_lanes and use
`(next_lanes != 0)` as the loop condition. This safely handles the
superset case (loop exits cleanly) and also avoids re-computing
`ep_live_lanes & ~started_lanes` twice per iteration.

Co-authored-by: Cursor <cursoragent@cursor.com>
(cherry picked from commit cf91dde)
@evgeny-leksikov evgeny-leksikov force-pushed the gtest_ft_put_am_flush_seq branch from d6b2106 to 51b9b60 Compare May 5, 2026 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant