fix(ir): fix BasicMemoryReuse aliasing of simultaneously live tiles#598
fix(ir): fix BasicMemoryReuse aliasing of simultaneously live tiles#598Little-oil wants to merge 1 commit intohw-native-sys:mainfrom
Conversation
ComputeLifetimesFromDependencies missed two categories of variable uses, causing independent accumulators to be incorrectly aliased onto the same MemRef buffer: 1. YieldStmt/ReturnStmt: collected into basic blocks but silently skipped by the if-else chain, so yielded/returned tiles appeared dead at the yield/return point. 2. ForStmt/WhileStmt init_values: ForStmt nodes are not flattened into basic blocks, making init_value variable references completely invisible to the lifetime analysis. Fix by handling YieldStmt/ReturnStmt in the statement loop, and adding LoopInitValueCollector to walk the function body and anchor init_value uses to the first leaf statement of the loop body. Fixes hw-native-sys#585
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical bug in the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
📝 WalkthroughWalkthroughThis PR enhances lifetime analysis in the basic memory reuse pass by introducing utilities to find first leaf statements and collect loop init_value uses via a new visitor. The signature of Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can make CodeRabbit's review stricter and more nitpicky using the `assertive` profile, if that's what you prefer.Change the |
There was a problem hiding this comment.
Code Review
This pull request correctly fixes a bug in the BasicMemoryReuse pass where variable lifetimes were being calculated incorrectly for YieldStmt, ReturnStmt, and loop init_values, leading to incorrect memory aliasing. The changes introduce handlers for these statements and a new visitor to collect init_value uses, which is a solid approach. The accompanying tests are well-written and effectively validate the fix. I've included a few suggestions to improve code maintainability by reducing duplication in both the C++ implementation and the new Python tests, aligning with established repository rules.
| class VarUseCollector : public IRVisitor { | ||
| public: | ||
| std::set<VarPtr> used_vars; | ||
| void VisitExpr_(const VarPtr& var) override { | ||
| used_vars.insert(var); | ||
| IRVisitor::VisitExpr_(var); | ||
| } | ||
| }; |
There was a problem hiding this comment.
This VarUseCollector class is identical to the one defined in ComputeLifetimesFromDependencies (starting at line 191). To avoid code duplication, consider defining this helper class once in a shared scope (e.g., the anonymous namespace at the top of the file) and reusing it in both LoopInitValueCollector and ComputeLifetimesFromDependencies.
References
- Extract duplicate logic into a private helper method to improve maintainability and avoid future inconsistencies.
| } else if (auto yield_stmt = As<YieldStmt>(stmt)) { | ||
| // Tiles yielded across loop iterations must remain live until the yield point | ||
| VarUseCollector collector; | ||
| for (const auto& val : yield_stmt->value_) { | ||
| collector.VisitExpr(val); | ||
| } | ||
| for (const auto& used_var : collector.used_vars) { | ||
| var_use_stmts[used_var].push_back(stmt); | ||
| } | ||
| } else if (auto return_stmt = As<ReturnStmt>(stmt)) { | ||
| // Tiles returned from the function must remain live until the return point | ||
| VarUseCollector collector; | ||
| for (const auto& val : return_stmt->value_) { | ||
| collector.VisitExpr(val); | ||
| } | ||
| for (const auto& used_var : collector.used_vars) { | ||
| var_use_stmts[used_var].push_back(stmt); | ||
| } | ||
| } |
There was a problem hiding this comment.
The logic for handling YieldStmt and ReturnStmt is identical. This logic is also very similar to the handlers for AssignStmt and EvalStmt. To improve maintainability and reduce code duplication, consider refactoring this entire if-else if chain. For example, you could extract the use-collection logic into a helper function or lambda.
References
- Extract duplicate logic into a private helper method to improve maintainability and avoid future inconsistencies.
| def test_yield_prevents_aliasing_of_simultaneously_live_tiles(self): | ||
| """Two tile accumulators inside a loop, both yielded, must NOT share MemRef. | ||
|
|
||
| gate_new and up_new are both live at the yield point, so their lifetimes | ||
| overlap. Without the YieldStmt fix, the yield was silently skipped and | ||
| both tiles appeared dead, causing incorrect aliasing. | ||
| """ | ||
|
|
||
| @pl.program | ||
| class Before: | ||
| @pl.function | ||
| def main( | ||
| self, | ||
| input_a: pl.Tensor[[64, 64], pl.FP32], | ||
| input_b: pl.Tensor[[64, 64], pl.FP32], | ||
| output: pl.Tensor[[64, 64], pl.FP32], | ||
| ) -> pl.Tensor[[64, 64], pl.FP32]: | ||
| gate_init: pl.Tile[[64, 64], pl.FP32] = pl.load(input_a, [0, 0], [64, 64]) | ||
| up_init: pl.Tile[[64, 64], pl.FP32] = pl.load(input_b, [0, 0], [64, 64]) | ||
| for _i, (gate_acc, up_acc) in pl.range(4, init_values=(gate_init, up_init)): | ||
| chunk: pl.Tile[[64, 64], pl.FP32] = pl.load(input_a, [0, 0], [64, 64]) | ||
| gate_new: pl.Tile[[64, 64], pl.FP32] = pl.add(gate_acc, chunk) | ||
| up_new: pl.Tile[[64, 64], pl.FP32] = pl.add(up_acc, chunk) | ||
| gate_out, up_out = pl.yield_(gate_new, up_new) | ||
| result: pl.Tensor[[64, 64], pl.FP32] = pl.store(gate_out, [0, 0], output) | ||
| return result | ||
|
|
||
| func = _prepare_and_run_memory_reuse(Before) | ||
|
|
||
| # gate_new and up_new are both live at the yield → must NOT share MemRef | ||
| _assert_not_shares_memref_recursive(func, "gate_new", "up_new") | ||
|
|
||
| def test_init_values_prevent_aliasing_of_loop_inputs(self): | ||
| """Two tiles used as init_values must NOT share MemRef. | ||
|
|
||
| gate_init and up_init are both consumed at the loop entry point as | ||
| init_values. Without the ForStmt init_value fix, these variables | ||
| appeared dead and got incorrectly aliased. | ||
| """ | ||
|
|
||
| @pl.program | ||
| class Before: | ||
| @pl.function | ||
| def main( | ||
| self, | ||
| input_a: pl.Tensor[[64, 64], pl.FP32], | ||
| input_b: pl.Tensor[[64, 64], pl.FP32], | ||
| output: pl.Tensor[[64, 64], pl.FP32], | ||
| ) -> pl.Tensor[[64, 64], pl.FP32]: | ||
| gate_init: pl.Tile[[64, 64], pl.FP32] = pl.load(input_a, [0, 0], [64, 64]) | ||
| up_init: pl.Tile[[64, 64], pl.FP32] = pl.load(input_b, [0, 0], [64, 64]) | ||
| for _i, (gate_acc, up_acc) in pl.range(4, init_values=(gate_init, up_init)): | ||
| chunk: pl.Tile[[64, 64], pl.FP32] = pl.load(input_a, [0, 0], [64, 64]) | ||
| gate_new: pl.Tile[[64, 64], pl.FP32] = pl.add(gate_acc, chunk) | ||
| up_new: pl.Tile[[64, 64], pl.FP32] = pl.add(up_acc, chunk) | ||
| gate_out, up_out = pl.yield_(gate_new, up_new) | ||
| result: pl.Tensor[[64, 64], pl.FP32] = pl.store(gate_out, [0, 0], output) | ||
| return result | ||
|
|
||
| func = _prepare_and_run_memory_reuse(Before) |
There was a problem hiding this comment.
The @pl.program class Before is duplicated in both test methods. To improve maintainability and reduce duplication, you can define this program once as a class attribute of TestYieldAndInitValueAliasing and reuse it in both tests.
@pl.program
class _TestProgram:
@pl.function
def main(
self,
input_a: pl.Tensor[[64, 64], pl.FP32],
input_b: pl.Tensor[[64, 64], pl.FP32],
output: pl.Tensor[[64, 64], pl.FP32]
) -> pl.Tensor[[64, 64], pl.FP32]:
gate_init: pl.Tile[[64, 64], pl.FP32] = pl.load(input_a, [0, 0], [64, 64])
up_init: pl.Tile[[64, 64], pl.FP32] = pl.load(input_b, [0, 0], [64, 64])
for _i, (gate_acc, up_acc) in pl.range(4, init_values=(gate_init, up_init)):
chunk: pl.Tile[[64, 64], pl.FP32] = pl.load(input_a, [0, 0], [64, 64])
gate_new: pl.Tile[[64, 64], pl.FP32] = pl.add(gate_acc, chunk)
up_new: pl.Tile[[64, 64], pl.FP32] = pl.add(up_acc, chunk)
gate_out, up_out = pl.yield_(gate_new, up_new)
result: pl.Tensor[[64, 64], pl.FP32] = pl.store(gate_out, [0, 0], output)
return result
def test_yield_prevents_aliasing_of_simultaneously_live_tiles(self):
"""Two tile accumulators inside a loop, both yielded, must NOT share MemRef.
gate_new and up_new are both live at the yield point, so their lifetimes
overlap. Without the YieldStmt fix, the yield was silently skipped and
both tiles appeared dead, causing incorrect aliasing.
"""
func = _prepare_and_run_memory_reuse(self._TestProgram)
# gate_new and up_new are both live at the yield → must NOT share MemRef
_assert_not_shares_memref_recursive(func, "gate_new", "up_new")
def test_init_values_prevent_aliasing_of_loop_inputs(self):
"""Two tiles used as init_values must NOT share MemRef.
gate_init and up_init are both consumed at the loop entry point as
init_values. Without the ForStmt init_value fix, these variables
appeared dead and got incorrectly aliased.
"""
func = _prepare_and_run_memory_reuse(self._TestProgram)There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/ut/ir/transforms/test_basic_memory_reuse.py (1)
1054-1119: Add regression coverage forReturnStmtandWhileStmttoo.These tests lock down
YieldStmtplusForStmt::init_values, but the implementation change also adds explicitReturnStmthandling andWhileStmt::initValue_collection. A small direct-tile-return case and awhile-loop case would keep those new branches from regressing silently.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ut/ir/transforms/test_basic_memory_reuse.py` around lines 1054 - 1119, Add two regression tests mirroring the ForStmt/YieldStmt cases to cover the new ReturnStmt and WhileStmt handling: (1) a test that returns a direct tile from the function (e.g., create a Before program whose main loads a tile and immediately returns it) and asserts via _prepare_and_run_memory_reuse and _assert_not_shares_memref_recursive that a returned tile does not share MemRef with another live tile; and (2) a test that uses a while-loop with init_value_ semantics (construct a Before program using pl.while or similar with init values consumed at loop entry, perform operations producing two tiles that are live at entry/yield, run through _prepare_and_run_memory_reuse, and assert the two init tiles do not share MemRef). Use the same helper symbols from the file (_prepare_and_run_memory_reuse, _assert_not_shares_memref_recursive, pl.range/pl.yield_ style for guidance) and follow the existing test structure and naming pattern so ReturnStmt and WhileStmt branches are exercised.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/ut/ir/transforms/test_basic_memory_reuse.py`:
- Line 1080: Rename the unused yielded variable up_out to _up_out in the test so
Ruff stops flagging it; specifically update the assignment "gate_out, up_out =
pl.yield_(gate_new, up_new)" (and the other occurrence at the second instance)
to "gate_out, _up_out = pl.yield_(gate_new, up_new)" to keep test behavior
identical while marking the value as intentionally unused.
---
Nitpick comments:
In `@tests/ut/ir/transforms/test_basic_memory_reuse.py`:
- Around line 1054-1119: Add two regression tests mirroring the
ForStmt/YieldStmt cases to cover the new ReturnStmt and WhileStmt handling: (1)
a test that returns a direct tile from the function (e.g., create a Before
program whose main loads a tile and immediately returns it) and asserts via
_prepare_and_run_memory_reuse and _assert_not_shares_memref_recursive that a
returned tile does not share MemRef with another live tile; and (2) a test that
uses a while-loop with init_value_ semantics (construct a Before program using
pl.while or similar with init values consumed at loop entry, perform operations
producing two tiles that are live at entry/yield, run through
_prepare_and_run_memory_reuse, and assert the two init tiles do not share
MemRef). Use the same helper symbols from the file
(_prepare_and_run_memory_reuse, _assert_not_shares_memref_recursive,
pl.range/pl.yield_ style for guidance) and follow the existing test structure
and naming pattern so ReturnStmt and WhileStmt branches are exercised.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 99dc5f83-a477-43ce-b837-784225def566
📒 Files selected for processing (2)
src/ir/transforms/basic_memory_reuse_pass.cpptests/ut/ir/transforms/test_basic_memory_reuse.py
| chunk: pl.Tile[[64, 64], pl.FP32] = pl.load(input_a, [0, 0], [64, 64]) | ||
| gate_new: pl.Tile[[64, 64], pl.FP32] = pl.add(gate_acc, chunk) | ||
| up_new: pl.Tile[[64, 64], pl.FP32] = pl.add(up_acc, chunk) | ||
| gate_out, up_out = pl.yield_(gate_new, up_new) |
There was a problem hiding this comment.
Rename the unused yielded value to satisfy Ruff.
up_out is never read in either test, so Ruff will keep flagging Lines 1080 and 1112. Prefix it with _ to preserve the test shape without suppressing the rule.
Suggested cleanup
- gate_out, up_out = pl.yield_(gate_new, up_new)
+ gate_out, _up_out = pl.yield_(gate_new, up_new)
...
- gate_out, up_out = pl.yield_(gate_new, up_new)
+ gate_out, _up_out = pl.yield_(gate_new, up_new)Also applies to: 1112-1112
🧰 Tools
🪛 Ruff (0.15.6)
[warning] 1080-1080: Unpacked variable up_out is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/ut/ir/transforms/test_basic_memory_reuse.py` at line 1080, Rename the
unused yielded variable up_out to _up_out in the test so Ruff stops flagging it;
specifically update the assignment "gate_out, up_out = pl.yield_(gate_new,
up_new)" (and the other occurrence at the second instance) to "gate_out, _up_out
= pl.yield_(gate_new, up_new)" to keep test behavior identical while marking the
value as intentionally unused.
Summary
BasicMemoryReuse pass's lifetime analysis (ComputeLifetimesFromDependencies) missed two categories of variable uses, causing simultaneously live tile accumulators to be incorrectly aliased onto the same MemRef buffer:
Fix:
Fixes #585
Testing
Added TestYieldAndInitValueAliasing test class with two regression tests:
test_yield_prevents_aliasing_of_simultaneously_live_tiles: Constructs a loop with two independent tile accumulators (gate_new, up_new) both yielded at the same point, verifying they are NOT assigned to the same MemRef.test_init_values_prevent_aliasing_of_loop_inputs: Constructs two tiles (gate_init, up_init) both used as ForStmt init_values, verifying they are NOT incorrectly aliased.Also added recursive helper utilities (_iter_all_assign_stmts, _get_var_type_recursive, _assert_not_shares_memref_recursive) to support MemRef verification inside nested control flow (ForStmt/IfStmt/WhileStmt bodies).