[Draft] [plugin][profiler] refine OOT profiler with record_function#348
Draft
zejunchen-zejun wants to merge 9 commits intomainfrom
Draft
[Draft] [plugin][profiler] refine OOT profiler with record_function#348zejunchen-zejun wants to merge 9 commits intomainfrom
zejunchen-zejun wants to merge 9 commits intomainfrom
Conversation
record function Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Refines ATOM’s vLLM OOT plugin profiling by adding torch.profiler.record_function spans around the model forward pass, with labels derived from vLLM forward-context attention metadata.
Changes:
- Adds helpers to extract step-level attention/plugin metadata from vLLM forward context.
- Builds a compact per-step profiler label (decode vs prefill/extend) from plugin metadata counters.
- Wraps
self.model(...)in a conditionalrecord_function(...)span when torch profiling is enabled via vLLM config.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
atom/plugin/vllm/model_wrapper.py
Outdated
Comment on lines
+116
to
+125
| # Shorthand label format: | ||
| # d = decode-only step, p = step containing prefill/extend work. | ||
| # req/tok = total requests/tokens in this step. | ||
| # dec/pre/ext each carry request count followed by token count. | ||
| step = "p" if (num_prefills > 0 or num_extends > 0) else "d" | ||
| return ( | ||
| f"{step}[req{total_reqs}, tok{num_actual_tokens}, " | ||
| f"dec{num_decodes}, tok{num_decode_tokens}, " | ||
| f"pre{num_prefills}, tok{num_prefill_tokens}, " | ||
| f"ext{num_extends}, tok{num_extend_tokens}]" |
atom/plugin/vllm/model_wrapper.py
Outdated
Comment on lines
+72
to
+80
| if isinstance(attn_metadata, list): | ||
| # In ubatch mode, vLLM stores one metadata dict per microbatch. We need | ||
| # the first actual per-layer metadata object, not the outer list itself. | ||
| # Keep the empty-dict guard for robustness if a placeholder slips through. | ||
| for ubatch_attn_metadata in attn_metadata: | ||
| if not ubatch_attn_metadata: | ||
| continue | ||
| return next(iter(ubatch_attn_metadata.values()), None) | ||
| return None |
atom/plugin/vllm/model_wrapper.py
Outdated
| return None | ||
|
|
||
| if isinstance(attn_metadata, dict): | ||
| return next(iter(attn_metadata.values()), None) |
Collaborator
|
please make sure have aligned style with ATOM main |
Contributor
Author
🆗 sure, this PR has bugs for now, no label info in profiler json, will fix soon |
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

We use below label to demonstrate the info in profiler
d[req128, tok128, dec128, tok128, pre0, tok0, ext0, tok0], which means decode step, 128 requests, 128 tokenspre means prefill, ext means extend path for attention