Fix stream race in dispatch/combine with deterministic algorithms by ccs668899 · Pull Request #600 · deepseek-ai/DeepEP

ccs668899 · 2026-04-06T04:11:26Z

The cause of #589 so far as I can tell is as follows in deep_ep.cpp:

// The comm stream waits for everything on the compute stream.
stream_wait(comm_stream, compute_stream)

// The output tensor is allocated, but in deterministic mode a NaN-fill is enqueued on the compute stream.
torch::empty(recv_x)

// The communication kernel writes recv_x on to the comm stream.
intranode::combine(..., recv_x, ..., comm_stream)

Now this could be fixed by putting the stream wait until after the empty allocation, but we have the comment "NOTES: do not allocate tensors upfront!". I don't want to mess up the stream ownership so I just add a second sync stream wait right before the intranode combine:

// The output tensor is allocated, but in deterministic mode a NaN-fill is enqueued on the compute stream.
torch::empty(recv_x)

// Ensure comm_stream observes all compute_stream work above this point.
if (!allocate_on_comm_stream)
    stream_wait(comm_stream, compute_stream);

// The communication kernel writes recv_x on to the comm stream.
intranode::combine(..., recv_x, ..., comm_stream)

The tests should deterministically catch the bug once the fix is in. I don't have a multi-GPU setup on hand to validate myself, any recommendations?

Add stream_wait(comm_stream, compute_stream) after torch::empty() allocations and before kernel launches in all four dispatch/combine functions. When fill_uninitialized_memory=True, torch::empty() enqueues a NaN-fill kernel on compute_stream after the initial stream_wait snapshot, causing a write-write race with the communication kernel on comm_stream. The second sync captures these fill kernels. Add regression test with 4-config control matrix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix stream race in dispatch/combine with deterministic algorithms#600

Fix stream race in dispatch/combine with deterministic algorithms#600
ccs668899 wants to merge 1 commit intodeepseek-ai:mainfrom
ccs668899:fix/stream-race-nan-deterministic

ccs668899 commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ccs668899 commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant