Skip to content

Add LRC microbenchmarks for L2 request coalescing analysis#82

Open
William-An wants to merge 2 commits intodevfrom
lrc_ubench
Open

Add LRC microbenchmarks for L2 request coalescing analysis#82
William-An wants to merge 2 commits intodevfrom
lrc_ubench

Conversation

@William-An
Copy link
Contributor

Summary

  • Add lrc_max_merged microbenchmark to measure the maximum number of L2 sector requests that can be merged per LRC (L2 Request Coalescer) entry on NVIDIA GPUs
  • Add lrc_queue_size stub with documented TODO for future work on discovering LRC queue depth per L2 sub-partition
  • Update .gitignore to exclude IDE/AI tool directories

Details

lrc_max_merged

Measures how many concurrent sector requests from multiple warps/threadblocks can be coalesced into a single L2 lookup by the LRC. Supports three launch modes:

  • Normal: Standard kernel launch
  • Cluster: Uses threadblock clusters with mbarrier-based synchronization to co-locate blocks in the same GPC
  • Cooperative: Uses cooperative kernel launch for grid-wide synchronization

Designed to be profiled with ncu — includes a run script (run_lrc_merged.sh) with relevant L2 sector metrics.

lrc_queue_size (stub)

Documents the open challenges for measuring LRC queue depth, including the need to reverse-engineer L2 address-to-sub-partition mapping.

Corresponds to Accel-Sim config parameter: -gpgpu_lrc_max_entries

Test plan

  • Build lrc_max_merged with make on SM_90+ target
  • Run run_lrc_merged.sh with ncu and verify L2 sector counts
  • Verify lrc_queue_size builds and prints stub message

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant