Add CUDA graph kernel annotations tutorial by yushangdi · Pull Request #3915 · pytorch/tutorials

yushangdi · 2026-06-02T20:25:47Z

This tutorial demonstrates how to use CUDA graph kernel annotations for semantic profiling traces with custom visualization lanes.

Features:

End-to-end workflow from graph capture to visualization
Transformer block example with annotated regions
Post-processing to merge annotations into profiler traces
Custom stream assignments for semantic organization
Version checking for cuda-bindings compatibility
Clear error messages with upgrade instructions

The tutorial includes:

mark_kernels() context manager usage
Graph capture with enable_annotations=True
Profiling and trace post-processing
Before/after comparison
Troubleshooting guide

Fixes #ISSUE_NUMBER

Description

Checklist

The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
Only one issue is addressed in this pull request
Labels from the issue that this PR is fixing are added to this pull request
No unnecessary issues are included into this pull request.

pytorch-bot · 2026-06-02T20:25:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3915

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 72d4e71 with merge base cdc645a ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This tutorial demonstrates how to use CUDA graph kernel annotations for semantic profiling traces with custom visualization lanes. Features: - End-to-end workflow from graph capture to visualization - Transformer block example with annotated regions - Post-processing to merge annotations into profiler traces - Custom stream assignments for semantic organization - Version checking for cuda-bindings compatibility - Clear error messages with upgrade instructions The tutorial includes: - mark_kernels() context manager usage - Graph capture with enable_annotations=True - Profiling and trace post-processing - Before/after comparison - Troubleshooting guide Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

This is required for the CUDA graph annotations tutorial to work with full annotation support. The cudaGraphNodeGetToolsId API was added in cuda-bindings 13.3.0.

- Removed check_cuda_bindings_version() function since PyTorch core now provides the warning via _probe_tools_id() - Updated PyTorch requirement from 2.0+ to 2.13+ (required for the annotation APIs used in this tutorial) - Simplified error messaging to reference PyTorch's built-in warnings

Changed the overview to emphasize: - Ability to add semantic labels to kernels - Understanding what each kernel does during profiling - Labeling and organizing kernels by function Rather than focusing on splitting kernels across streams, the overview now centers on the annotation feature itself.

Updated the prerequisites card at the top to show PyTorch 2.12+ (was still showing 2.0). Also updated cuda-python to cuda-bindings for consistency.

Added chrome://tracing screenshots showing: - Before: All 65 kernels on single stream with auto-generated names - After: Kernels organized into semantic lanes (streams 61, 62) with meaningful labels (attention, mlp) Screenshots demonstrate the value of kernel annotations for understanding execution structure and identifying components.

Move `if __name__ == "__main__": main()` to immediately after the main() function definition (line ~404) so it executes during the Sphinx Gallery build process. Sphinx Gallery requires the execution guard to be positioned right after the function definition, not at the end of the file, to properly capture and execute the tutorial code during documentation generation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

ngimel · 2026-06-02T21:10:37Z

 matplotlib
 librosa
 torch==2.12
+cuda-bindings>=13.3.0  # Required for CUDA graph annotations tutorial


you should be able to use earlier version?

These files were accidentally included in the previous commit: - traces/ directory (both root and advanced_source/) - Screenshot PNG files - CUDA_GRAPH_TUTORIAL_README.md These are build artifacts and temporary files that should not be committed to the repository. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

meta-cla Bot added the cla signed label Jun 2, 2026

yushangdi force-pushed the cudagraph_annotation branch from 4a6f9d9 to c39bac5 Compare June 2, 2026 20:32

yushangdi and others added 8 commits June 2, 2026 20:33

Add cuda-bindings>=13.3.0 to requirements

d9b296c

This is required for the CUDA graph annotations tutorial to work with full annotation support. The cudaGraphNodeGetToolsId API was added in cuda-bindings 13.3.0.

Update PyTorch requirement to 2.12+

3ef9d30

Fix PyTorch version in prerequisites card

6a79dc3

Updated the prerequisites card at the top to show PyTorch 2.12+ (was still showing 2.0). Also updated cuda-python to cuda-bindings for consistency.

Remove Advanced Usage: Multiple Graphs section

b7cc171

ngimel reviewed Jun 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CUDA graph kernel annotations tutorial#3915

Add CUDA graph kernel annotations tutorial#3915
yushangdi wants to merge 10 commits into
mainfrom
cudagraph_annotation

yushangdi commented Jun 2, 2026

Uh oh!

pytorch-bot Bot commented Jun 2, 2026 •

edited

Loading

Uh oh!

ngimel Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yushangdi commented Jun 2, 2026

Description

Checklist

Uh oh!

pytorch-bot Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3915

✅ No Failures

Uh oh!

ngimel Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot Bot commented Jun 2, 2026 •

edited

Loading