Skip to content

[Draft] [plugin][profiler] refine OOT profiler with record_function#348

Draft
zejunchen-zejun wants to merge 9 commits intomainfrom
zejun/refine_oot_profiler
Draft

[Draft] [plugin][profiler] refine OOT profiler with record_function#348
zejunchen-zejun wants to merge 9 commits intomainfrom
zejun/refine_oot_profiler

Conversation

@zejunchen-zejun
Copy link
Contributor

We use below label to demonstrate the info in profiler
d[req128, tok128, dec128, tok128, pre0, tok0, ext0, tok0] , which means decode step, 128 requests, 128 tokens
pre means prefill, ext means extend path for attention

record function

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Copilot AI review requested due to automatic review settings March 17, 2026 14:29
@zejunchen-zejun zejunchen-zejun marked this pull request as draft March 17, 2026 14:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refines ATOM’s vLLM OOT plugin profiling by adding torch.profiler.record_function spans around the model forward pass, with labels derived from vLLM forward-context attention metadata.

Changes:

  • Adds helpers to extract step-level attention/plugin metadata from vLLM forward context.
  • Builds a compact per-step profiler label (decode vs prefill/extend) from plugin metadata counters.
  • Wraps self.model(...) in a conditional record_function(...) span when torch profiling is enabled via vLLM config.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +116 to +125
# Shorthand label format:
# d = decode-only step, p = step containing prefill/extend work.
# req/tok = total requests/tokens in this step.
# dec/pre/ext each carry request count followed by token count.
step = "p" if (num_prefills > 0 or num_extends > 0) else "d"
return (
f"{step}[req{total_reqs}, tok{num_actual_tokens}, "
f"dec{num_decodes}, tok{num_decode_tokens}, "
f"pre{num_prefills}, tok{num_prefill_tokens}, "
f"ext{num_extends}, tok{num_extend_tokens}]"
Comment on lines +72 to +80
if isinstance(attn_metadata, list):
# In ubatch mode, vLLM stores one metadata dict per microbatch. We need
# the first actual per-layer metadata object, not the outer list itself.
# Keep the empty-dict guard for robustness if a placeholder slips through.
for ubatch_attn_metadata in attn_metadata:
if not ubatch_attn_metadata:
continue
return next(iter(ubatch_attn_metadata.values()), None)
return None
return None

if isinstance(attn_metadata, dict):
return next(iter(attn_metadata.values()), None)
@valarLip
Copy link
Collaborator

please make sure have aligned style with ATOM main

@zejunchen-zejun
Copy link
Contributor Author

please make sure have aligned style with ATOM main

🆗 sure, this PR has bugs for now, no label info in profiler json, will fix soon

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
@zejunchen-zejun
Copy link
Contributor Author

zejunchen-zejun commented Mar 18, 2026

Pending this PR, because it is not easy to add customized label and make all of the kernels from this step to be attributed into this customized label. Here is the issue we found. We have customized the label info and record it before the model run, while the kernels cannot be attributed to this step exclude the last step, so the label will be treated as the normal user annotation instead of the gpu user annotation, so the profiler json is shown as below. Only last step has correct kernels and associated labels. For other steps, it doesn't work for now.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants