Add callbacks for driver cuGraphLaunch by gnurizen · Pull Request #21 · parca-dev/parcagpu

gnurizen · 2026-06-26T13:17:29Z

Problem

parcagpu subscribed to CUPTI callbacks for eager kernel launches (cuLaunchKernel) and runtime graph launches (cudaGraphLaunch), but not the driver-API graph launch (cuGraphLaunch / cuGraphLaunch_ptsz). setLaunchCallbacks covers only kernel-launch cbids and setRuntimeCallbacks only the runtime domain, so the driver graph-launch cbid was never enabled.

C++ runtimes such as TensorRT-LLM replay CUDA graphs through the driver API. On those workloads the cuda_correlation USDT never fired for graph launches, no CUDA_KERNEL frame was pushed, and the GPU sample pipeline produced nothing in graph mode — while eager launches kept working. This was confirmed on a 4×B200 TRT-LLM node: a trace_pipe capture in graph mode contained zero FRAME_MARKER_CUDA_KERNEL frames.

Fix

Enable only the two driver graph-launch cbids in CuptiProfiler::initialize (with symmetric teardown). Deliberately not setGraphCallbacks(), which also subscribes the capture (cuStreamBegin/EndCapture) and graph-resource cbids — those would make callbackHandler emit correlation events for non-executing capture calls that never receive kernel timing. The existing isGraphLaunch handler already had all the logic to process these callbacks; it just never received them.

Test coverage

The mock harness previously dispatched the shim's callback unconditionally, so a missing subscription was invisible to tests. Changes:

mock_cupti.c: cuptiEnableCallback now records the subscribed (domain, cbid) set.
test_cupti_prof.c: callbacks are dispatched only if subscribed, mirroring real CUPTI gating.
test-pc-mock-graph.sh: asserts driver cuGraphLaunch correlation events fire (signed cbid -514/-515). The workload splits graph launches 50/50 runtime/driver, so the runtime half emits regardless — only this check catches the regression. Wired to a new make test-pc-mock-graph target.
graph_repro.cu / graph-repro-real.sh: real-GPU reproducer that replays a captured graph via driver cuGraphLaunch and verifies correlation events fire.

Verified the mock guard goes red without the fix and green with it.

parcagpu subscribed CUPTI callbacks for eager launches (cuLaunchKernel) and runtime graph launches (cudaGraphLaunch), but not the driver-API cuGraphLaunch / cuGraphLaunch_ptsz. C++ runtimes like TensorRT-LLM replay CUDA graphs through the driver API, so the cuda_correlation USDT never fired in graph mode and no GPU samples were produced, while eager launches kept working (confirmed on a 4xB200 TRT-LLM node). Fix: enable the two driver graph-launch cbids in CuptiProfiler::initialize (with symmetric teardown). Not setGraphCallbacks(), which also subscribes capture cbids that would emit correlation events for non-executing capture calls. Tests: the mock harness dispatched callbacks unconditionally, hiding the missing subscription. mock_cupti now records the subscribed (domain,cbid) set and the harness only dispatches subscribed callbacks. test-pc-mock-graph (new make target) asserts driver cuGraphLaunch correlation events fire; graph_repro.cu / graph-repro-real.sh add a real-GPU reproducer. Verified the guard goes red without the fix and green with it.

gnurizen force-pushed the driver-cugraphlaunch-callbacks branch from d698a08 to 633dc6b Compare June 26, 2026 13:24

gnurizen requested review from brancz and umanwizard June 26, 2026 13:30

brancz approved these changes Jun 26, 2026

View reviewed changes

More fixes/tests from spark testing

f0fbd49

gnurizen merged commit 474be16 into main Jun 26, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add callbacks for driver cuGraphLaunch#21

Add callbacks for driver cuGraphLaunch#21
gnurizen merged 2 commits into
mainfrom
driver-cugraphlaunch-callbacks

gnurizen commented Jun 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

gnurizen commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Test coverage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gnurizen commented Jun 26, 2026 •

edited

Loading