Phambinh/implement emptynode by phambinhfin · Pull Request #680 · ROCm/xla

phambinhfin · 2026-03-18T12:46:07Z

The CUDA implementation of XLA's command buffer infrastructure has two safeguards for empty graph scenarios that the HIP (ROCm) implementation was missing:

Empty traced graph detection -- When stream capture produces a graph with zero operations (e.g., a custom call that launches nothing), CUDA detects this and returns a clear error. Without this guard on HIP, the runtime would crash during graph instantiation with an opaque driver error.

Empty body graph handling in PrepareFinalization -- CUDA injects a NoOp kernel into empty conditional body graphs (for CUDA < 12.8) to prevent crashes. HIP's PrepareFinalization was a no-op, leaving it unprotected for when graph conditionals are supported on HIP in the future.

Additionally, CreateEmptyNode (via hipGraphAddEmptyNode) was recently implemented in commit b561aaf to fix collective crashes, but had no test coverage on the HIP/ROCm backend -- all existing empty node tests were gated behind CUDA 12.3+ version checks.

This PR adds the missing safeguards and platform-agnostic test coverage.

- Add hipGraphGetRootNodes to the ROCm driver wrapper - Add empty traced graph detection in RocmCommandBuffer::Trace() to prevent crashes when a captured stream produces no operations, matching the existing CUDA guard - Implement PrepareFinalization to insert an empty node into empty body graphs, preventing potential issues when graph conditionals are supported on HIP in the future - Add four new platform-agnostic test cases for CreateEmptyNode: EmptyNodeWithKernel, EmptyNodeOnly, EmptyNodeChain, and EmptyNodeAsDependencyBarrier

claude · 2026-03-18T12:53:28Z

xla/stream_executor/gpu/gpu_command_buffer_test.cc

+TEST(GpuCommandBufferTest, EmptyNodeOnly) {
+  Platform* platform = GpuPlatform();
+  StreamExecutor* executor = platform->ExecutorForDevice(0).value();
+
+  TF_ASSERT_OK_AND_ASSIGN(auto stream, executor->CreateStream());
+
+  TF_ASSERT_OK_AND_ASSIGN(auto cmd_buffer,
+                          executor->CreateCommandBuffer(primary));
+  TF_ASSERT_OK_AND_ASSIGN(auto* empty_cmd, cmd_buffer->CreateEmptyCmd({}));
+  (void)empty_cmd;
+  TF_ASSERT_OK(cmd_buffer->Finalize());
+
+  TF_ASSERT_OK(cmd_buffer->Submit(stream.get()));
+  TF_ASSERT_OK(stream->BlockHostUntilDone());
+}


Misleading test comment: The comment says this test "exercises PrepareFinalization on HIP," but it doesn't actually hit the empty-graph insertion path. CreateEmptyCmd({}) adds one node, so by the time Finalize() → PrepareFinalization() runs, GetNodeCount() returns 1 and the method early-returns at the node_count > 0 check.

To truly exercise the PrepareFinalization empty-graph path, you'd need a command buffer with zero nodes at finalization time (no CreateEmptyCmd call). Consider either:

Updating the comment to reflect what this test actually covers (that a single empty node can be finalized and executed), or

Adding a separate test that calls Finalize() on a command buffer with zero user-added nodes.

claude · 2026-03-18T12:53:34Z

xla/stream_executor/rocm/rocm_command_buffer.cc

 absl::Status RocmCommandBuffer::PrepareFinalization() {
+  TF_ASSIGN_OR_RETURN(auto node_count, GetNodeCount());
+  if (node_count > 0) {
+    return absl::OkStatus();
+  }
+
+  // HIP graph conditional nodes (once supported) may not handle empty body
+  // graphs. Insert an empty node so the graph is non-empty, analogous to
+  // CUDA's NoOp kernel insertion for the same case.
+  hipGraphNode_t node_handle = nullptr;
+  TF_RETURN_IF_ERROR(
+      ToStatus(wrap::hipGraphAddEmptyNode(&node_handle, graph_,
+                                          /*pDependencies=*/nullptr,
+                                          /*numDependencies=*/0),
+               "Failed to add empty node in PrepareFinalization"));
  return absl::OkStatus();


Untracked graph node: The node_handle created here is added directly to the HIP graph via the raw API but is never returned to or tracked by the GpuCommandBuffer base class. Other node-creation methods (CreateEmptyNode, CreateKernelNode, etc.) return a GraphNodeHandle that the base class stores in its internal command structure.

This mirrors the CUDA behavior (which also adds a NoOp kernel directly), so it's consistent today. However, if any future code relies on the command list being a complete representation of the graph nodes, this node will be invisible. Worth a brief comment noting this is intentional.

claude · 2026-03-18T12:53:39Z

xla/stream_executor/rocm/rocm_command_buffer.h

  absl::Status SetPriority(StreamPriority priority) override {
-    return absl::UnimplementedError("Not implemented.");
+    // HIP does not expose an API to set per-node priority on graph kernel nodes
+    // (no equivalent of cuGraphKernelNodeSetAttribute with
+    // CU_LAUNCH_ATTRIBUTE_PRIORITY). Silently ignore priority requests so that
+    // collectives (which use StreamPriority::Highest) can be captured into HIP
+    // graphs without crashing.
+    return absl::OkStatus();
  }


Nit / observability: The silent no-op is well-documented in the comment, and this is the right approach given HIP's API limitations. Consider adding a VLOG(2) when a non-default priority is silently ignored, so it's visible in debug logs when someone is troubleshooting priority-related behavior:

Suggested change

absl::Status SetPriority(StreamPriority priority) override {

return absl::UnimplementedError("Not implemented.");

// HIP does not expose an API to set per-node priority on graph kernel nodes

// (no equivalent of cuGraphKernelNodeSetAttribute with

// CU_LAUNCH_ATTRIBUTE_PRIORITY). Silently ignore priority requests so that

// collectives (which use StreamPriority::Highest) can be captured into HIP

// graphs without crashing.

return absl::OkStatus();

}

// HIP does not expose an API to set per-node priority on graph kernel nodes

// (no equivalent of cuGraphKernelNodeSetAttribute with

// CU_LAUNCH_ATTRIBUTE_PRIORITY). Silently ignore priority requests so that

// collectives (which use StreamPriority::Highest) can be captured into HIP

// graphs without crashing.

if (priority != StreamPriority::Default) {

VLOG(2) << "Ignoring StreamPriority on HIP (no equivalent API)";

}

return absl::OkStatus();

claude · 2026-03-18T12:53:44Z

xla/stream_executor/gpu/gpu_command_buffer_test.cc

+  TF_ASSERT_OK(cmd_buffer->CreateLaunch(add, ThreadDim(), BlockDim(4),
+                                         {empty_cmd}, a, b, c)
+                   .status());


Minor style inconsistency: CreateLaunch is called with .status() to extract only the status here, while in EmptyNodeChain (line 854) and EmptyNodeAsDependencyBarrier (line 898), TF_ASSERT_OK is called directly on the StatusOr return. Both work, but for consistency within the new tests, consider using the same pattern throughout.

claude · 2026-03-18T12:53:54Z

Review Summary

Good PR — correctly implements the missing ROCm safeguards to match CUDA behavior for HIP command buffers. The CreateEmptyNode, Trace() empty-graph check, PrepareFinalization() empty-node insertion, and SetPriority silent-OK are all implemented consistently with the CUDA side. Tests cover the key scenarios well.

4 inline comments posted covering: a misleading test comment about PrepareFinalization coverage, an untracked graph node in PrepareFinalization, a suggestion to add VLOG for silently-ignored priority, and a minor style inconsistency in test assertions.

🤖 Generated with Claude Code

ScXfjiang and others added 2 commits March 18, 2026 12:18

fix collective crash

1259ff0

claude bot reviewed Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phambinh/implement emptynode#680

Phambinh/implement emptynode#680
phambinhfin wants to merge 2 commits intorocm-jaxlib-v0.8.2from
phambinh/implement-emptynode

phambinhfin commented Mar 18, 2026 •

edited

Loading

Uh oh!

claude bot Mar 18, 2026

Uh oh!

claude bot Mar 18, 2026

Uh oh!

claude bot Mar 18, 2026

Uh oh!

claude bot Mar 18, 2026

Uh oh!

claude bot commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

phambinhfin commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot commented Mar 18, 2026

Review Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

phambinhfin commented Mar 18, 2026 •

edited

Loading