Add CUDA graph kernel annotations tutorial#3915
Draft
yushangdi wants to merge 10 commits into
Draft
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3915
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 72d4e71 with merge base cdc645a ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This tutorial demonstrates how to use CUDA graph kernel annotations for semantic profiling traces with custom visualization lanes. Features: - End-to-end workflow from graph capture to visualization - Transformer block example with annotated regions - Post-processing to merge annotations into profiler traces - Custom stream assignments for semantic organization - Version checking for cuda-bindings compatibility - Clear error messages with upgrade instructions The tutorial includes: - mark_kernels() context manager usage - Graph capture with enable_annotations=True - Profiling and trace post-processing - Before/after comparison - Troubleshooting guide Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
4a6f9d9 to
c39bac5
Compare
This is required for the CUDA graph annotations tutorial to work with full annotation support. The cudaGraphNodeGetToolsId API was added in cuda-bindings 13.3.0.
- Removed check_cuda_bindings_version() function since PyTorch core now provides the warning via _probe_tools_id() - Updated PyTorch requirement from 2.0+ to 2.13+ (required for the annotation APIs used in this tutorial) - Simplified error messaging to reference PyTorch's built-in warnings
Changed the overview to emphasize: - Ability to add semantic labels to kernels - Understanding what each kernel does during profiling - Labeling and organizing kernels by function Rather than focusing on splitting kernels across streams, the overview now centers on the annotation feature itself.
Updated the prerequisites card at the top to show PyTorch 2.12+ (was still showing 2.0). Also updated cuda-python to cuda-bindings for consistency.
Added chrome://tracing screenshots showing: - Before: All 65 kernels on single stream with auto-generated names - After: Kernels organized into semantic lanes (streams 61, 62) with meaningful labels (attention, mlp) Screenshots demonstrate the value of kernel annotations for understanding execution structure and identifying components.
Move `if __name__ == "__main__": main()` to immediately after the main() function definition (line ~404) so it executes during the Sphinx Gallery build process. Sphinx Gallery requires the execution guard to be positioned right after the function definition, not at the end of the file, to properly capture and execute the tutorial code during documentation generation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
ngimel
reviewed
Jun 2, 2026
| matplotlib | ||
| librosa | ||
| torch==2.12 | ||
| cuda-bindings>=13.3.0 # Required for CUDA graph annotations tutorial |
There was a problem hiding this comment.
you should be able to use earlier version?
These files were accidentally included in the previous commit: - traces/ directory (both root and advanced_source/) - Screenshot PNG files - CUDA_GRAPH_TUTORIAL_README.md These are build artifacts and temporary files that should not be committed to the repository. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This tutorial demonstrates how to use CUDA graph kernel annotations for semantic profiling traces with custom visualization lanes.
Features:
The tutorial includes:
Fixes #ISSUE_NUMBER
Description
Checklist