Skip to content

fix(ir): fix BasicMemoryReuse aliasing of simultaneously live tiles#598

Open
Little-oil wants to merge 1 commit intohw-native-sys:mainfrom
Little-oil:issue-585-fix-memory-reuse-aliasing
Open

fix(ir): fix BasicMemoryReuse aliasing of simultaneously live tiles#598
Little-oil wants to merge 1 commit intohw-native-sys:mainfrom
Little-oil:issue-585-fix-memory-reuse-aliasing

Conversation

@Little-oil
Copy link
Contributor

Summary

BasicMemoryReuse pass's lifetime analysis (ComputeLifetimesFromDependencies) missed two categories of variable uses, causing simultaneously live tile accumulators to be incorrectly aliased onto the same MemRef buffer:

  1. YieldStmt/ReturnStmt: These statements were collected into basic blocks but silently skipped by the if-else type dispatch chain, so yielded/returned tiles appeared dead at the yield/return point — their lifetimes were prematurely truncated.
  2. ForStmt/WhileStmt init_values: ForStmt nodes are not flattened into basic blocks, so variables referenced in IterArg::initValue_ were completely invisible to the block-based lifetime analysis.

Fix:

  • Handle YieldStmt and ReturnStmt in the statement traversal loop, collecting variable uses to extend their lifetimes to the yield/return point.
  • Add LoopInitValueCollector visitor that walks the function body, collects ForStmt/WhileStmt init_value variable uses, and anchors them to the first leaf statement of the loop body.

Fixes #585

Testing

Added TestYieldAndInitValueAliasing test class with two regression tests:

  • test_yield_prevents_aliasing_of_simultaneously_live_tiles: Constructs a loop with two independent tile accumulators (gate_new, up_new) both yielded at the same point, verifying they are NOT assigned to the same MemRef.
  • test_init_values_prevent_aliasing_of_loop_inputs: Constructs two tiles (gate_init, up_init) both used as ForStmt init_values, verifying they are NOT incorrectly aliased.

Also added recursive helper utilities (_iter_all_assign_stmts, _get_var_type_recursive, _assert_not_shares_memref_recursive) to support MemRef verification inside nested control flow (ForStmt/IfStmt/WhileStmt bodies).

ComputeLifetimesFromDependencies missed two categories of variable uses,
causing independent accumulators to be incorrectly aliased onto the same
MemRef buffer:

1. YieldStmt/ReturnStmt: collected into basic blocks but silently skipped
   by the if-else chain, so yielded/returned tiles appeared dead at the
   yield/return point.

2. ForStmt/WhileStmt init_values: ForStmt nodes are not flattened into
   basic blocks, making init_value variable references completely
   invisible to the lifetime analysis.

Fix by handling YieldStmt/ReturnStmt in the statement loop, and adding
LoopInitValueCollector to walk the function body and anchor init_value
uses to the first leaf statement of the loop body.

Fixes hw-native-sys#585
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical bug in the BasicMemoryReuse pass's lifetime analysis, which previously failed to correctly identify the live ranges of variables involved in YieldStmt, ReturnStmt, and loop init_values. By introducing specific handling for these IR constructs, the pass now accurately prevents incorrect memory aliasing, thereby enhancing the correctness and robustness of memory optimization for intermediate representations.

Highlights

  • Fix for Incorrect Memory Aliasing: The BasicMemoryReuse pass previously failed to correctly identify the lifetimes of variables, leading to incorrect aliasing of simultaneously live tile accumulators.
  • Enhanced Lifetime Analysis for Yield/Return Statements: The lifetime analysis now correctly processes YieldStmt and ReturnStmt, ensuring that variables used in these statements remain live until their respective yield or return points, preventing premature truncation of their lifetimes.
  • Correct Handling of Loop Initial Values: A new LoopInitValueCollector was introduced to accurately track variables used as init_values in ForStmt and WhileStmt iter_args, anchoring their lifetimes to the first leaf statement of the loop body.
  • New Regression Tests: Two new regression tests were added in TestYieldAndInitValueAliasing to specifically verify that yield statements and loop init_values correctly prevent aliasing of simultaneously live tiles and loop inputs.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Mar 18, 2026

📝 Walkthrough

Walkthrough

This PR enhances lifetime analysis in the basic memory reuse pass by introducing utilities to find first leaf statements and collect loop init_value uses via a new visitor. The signature of ComputeLifetimesFromDependencies is updated to accept the function body, enabling analysis of loop init_values and yield/return statements for improved memory reuse accuracy.

Changes

Cohort / File(s) Summary
Core Lifetime Analysis Refactoring
src/ir/transforms/basic_memory_reuse_pass.cpp
Adds FindFirstLeafStmt utility to locate first non-flow leaf statements; introduces LoopInitValueCollector visitor to map init_value uses to loop body anchors; extends ComputeLifetimesFromDependencies to accept func_body parameter for analyzing ForStmt/WhileStmt init_values; adds YieldStmt and ReturnStmt variable use collection; updates call site to pass func->body_ into the analysis function.
Test Coverage for Yield and Init Value Aliasing
tests/ut/ir/transforms/test_basic_memory_reuse.py
Adds recursive tree-walking helpers (_gather_assign_stmts, _extract_type_recursive) to locate variables within nested statement structures; introduces _assert_not_shares_memref_recursive for deeper memref sharing checks; adds TestYieldAndInitValueAliasing suite with two tests verifying that simultaneously-live yields and loop init_values do not share buffers.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • lyfne123
  • Hzfengsy

Poem

🐰 A loop's init value, once lost in the dark,
Now leaves a bright lifetime and memory mark!
With yields and returns brought into the light,
Our buffers stay separate—no more aliasing blight! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main fix: correcting BasicMemoryReuse aliasing of simultaneously live tiles, which is the core issue addressed in the PR.
Description check ✅ Passed The description comprehensively explains the bug, the root causes (YieldStmt/ReturnStmt handling and ForStmt/WhileStmt init_value collection), the fixes, and the tests added—all directly related to the changeset.
Linked Issues check ✅ Passed The PR fully addresses issue #585 by fixing lifetime analysis for YieldStmt/ReturnStmt uses and ForStmt/WhileStmt init_values to prevent incorrect buffer aliasing of simultaneously live accumulators.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the BasicMemoryReuse lifetime analysis issue: new utility functions, LoopInitValueCollector visitor, extended statement handling, and regression tests for the identified bug.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can make CodeRabbit's review stricter and more nitpicky using the `assertive` profile, if that's what you prefer.

Change the reviews.profile setting to assertive to make CodeRabbit's nitpick more issues in your PRs.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a bug in the BasicMemoryReuse pass where variable lifetimes were being calculated incorrectly for YieldStmt, ReturnStmt, and loop init_values, leading to incorrect memory aliasing. The changes introduce handlers for these statements and a new visitor to collect init_value uses, which is a solid approach. The accompanying tests are well-written and effectively validate the fix. I've included a few suggestions to improve code maintainability by reducing duplication in both the C++ implementation and the new Python tests, aligning with established repository rules.

Comment on lines +146 to +153
class VarUseCollector : public IRVisitor {
public:
std::set<VarPtr> used_vars;
void VisitExpr_(const VarPtr& var) override {
used_vars.insert(var);
IRVisitor::VisitExpr_(var);
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This VarUseCollector class is identical to the one defined in ComputeLifetimesFromDependencies (starting at line 191). To avoid code duplication, consider defining this helper class once in a shared scope (e.g., the anonymous namespace at the top of the file) and reusing it in both LoopInitValueCollector and ComputeLifetimesFromDependencies.

References
  1. Extract duplicate logic into a private helper method to improve maintainability and avoid future inconsistencies.

Comment on lines +227 to +245
} else if (auto yield_stmt = As<YieldStmt>(stmt)) {
// Tiles yielded across loop iterations must remain live until the yield point
VarUseCollector collector;
for (const auto& val : yield_stmt->value_) {
collector.VisitExpr(val);
}
for (const auto& used_var : collector.used_vars) {
var_use_stmts[used_var].push_back(stmt);
}
} else if (auto return_stmt = As<ReturnStmt>(stmt)) {
// Tiles returned from the function must remain live until the return point
VarUseCollector collector;
for (const auto& val : return_stmt->value_) {
collector.VisitExpr(val);
}
for (const auto& used_var : collector.used_vars) {
var_use_stmts[used_var].push_back(stmt);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for handling YieldStmt and ReturnStmt is identical. This logic is also very similar to the handlers for AssignStmt and EvalStmt. To improve maintainability and reduce code duplication, consider refactoring this entire if-else if chain. For example, you could extract the use-collection logic into a helper function or lambda.

References
  1. Extract duplicate logic into a private helper method to improve maintainability and avoid future inconsistencies.

Comment on lines +1057 to +1116
def test_yield_prevents_aliasing_of_simultaneously_live_tiles(self):
"""Two tile accumulators inside a loop, both yielded, must NOT share MemRef.

gate_new and up_new are both live at the yield point, so their lifetimes
overlap. Without the YieldStmt fix, the yield was silently skipped and
both tiles appeared dead, causing incorrect aliasing.
"""

@pl.program
class Before:
@pl.function
def main(
self,
input_a: pl.Tensor[[64, 64], pl.FP32],
input_b: pl.Tensor[[64, 64], pl.FP32],
output: pl.Tensor[[64, 64], pl.FP32],
) -> pl.Tensor[[64, 64], pl.FP32]:
gate_init: pl.Tile[[64, 64], pl.FP32] = pl.load(input_a, [0, 0], [64, 64])
up_init: pl.Tile[[64, 64], pl.FP32] = pl.load(input_b, [0, 0], [64, 64])
for _i, (gate_acc, up_acc) in pl.range(4, init_values=(gate_init, up_init)):
chunk: pl.Tile[[64, 64], pl.FP32] = pl.load(input_a, [0, 0], [64, 64])
gate_new: pl.Tile[[64, 64], pl.FP32] = pl.add(gate_acc, chunk)
up_new: pl.Tile[[64, 64], pl.FP32] = pl.add(up_acc, chunk)
gate_out, up_out = pl.yield_(gate_new, up_new)
result: pl.Tensor[[64, 64], pl.FP32] = pl.store(gate_out, [0, 0], output)
return result

func = _prepare_and_run_memory_reuse(Before)

# gate_new and up_new are both live at the yield → must NOT share MemRef
_assert_not_shares_memref_recursive(func, "gate_new", "up_new")

def test_init_values_prevent_aliasing_of_loop_inputs(self):
"""Two tiles used as init_values must NOT share MemRef.

gate_init and up_init are both consumed at the loop entry point as
init_values. Without the ForStmt init_value fix, these variables
appeared dead and got incorrectly aliased.
"""

@pl.program
class Before:
@pl.function
def main(
self,
input_a: pl.Tensor[[64, 64], pl.FP32],
input_b: pl.Tensor[[64, 64], pl.FP32],
output: pl.Tensor[[64, 64], pl.FP32],
) -> pl.Tensor[[64, 64], pl.FP32]:
gate_init: pl.Tile[[64, 64], pl.FP32] = pl.load(input_a, [0, 0], [64, 64])
up_init: pl.Tile[[64, 64], pl.FP32] = pl.load(input_b, [0, 0], [64, 64])
for _i, (gate_acc, up_acc) in pl.range(4, init_values=(gate_init, up_init)):
chunk: pl.Tile[[64, 64], pl.FP32] = pl.load(input_a, [0, 0], [64, 64])
gate_new: pl.Tile[[64, 64], pl.FP32] = pl.add(gate_acc, chunk)
up_new: pl.Tile[[64, 64], pl.FP32] = pl.add(up_acc, chunk)
gate_out, up_out = pl.yield_(gate_new, up_new)
result: pl.Tensor[[64, 64], pl.FP32] = pl.store(gate_out, [0, 0], output)
return result

func = _prepare_and_run_memory_reuse(Before)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The @pl.program class Before is duplicated in both test methods. To improve maintainability and reduce duplication, you can define this program once as a class attribute of TestYieldAndInitValueAliasing and reuse it in both tests.

    @pl.program
    class _TestProgram:
        @pl.function
        def main(
            self,
            input_a: pl.Tensor[[64, 64], pl.FP32],
            input_b: pl.Tensor[[64, 64], pl.FP32],
            output: pl.Tensor[[64, 64], pl.FP32]
        ) -> pl.Tensor[[64, 64], pl.FP32]:
            gate_init: pl.Tile[[64, 64], pl.FP32] = pl.load(input_a, [0, 0], [64, 64])
            up_init: pl.Tile[[64, 64], pl.FP32] = pl.load(input_b, [0, 0], [64, 64])
            for _i, (gate_acc, up_acc) in pl.range(4, init_values=(gate_init, up_init)):
                chunk: pl.Tile[[64, 64], pl.FP32] = pl.load(input_a, [0, 0], [64, 64])
                gate_new: pl.Tile[[64, 64], pl.FP32] = pl.add(gate_acc, chunk)
                up_new: pl.Tile[[64, 64], pl.FP32] = pl.add(up_acc, chunk)
                gate_out, up_out = pl.yield_(gate_new, up_new)
            result: pl.Tensor[[64, 64], pl.FP32] = pl.store(gate_out, [0, 0], output)
            return result

    def test_yield_prevents_aliasing_of_simultaneously_live_tiles(self):
        """Two tile accumulators inside a loop, both yielded, must NOT share MemRef.

        gate_new and up_new are both live at the yield point, so their lifetimes
        overlap. Without the YieldStmt fix, the yield was silently skipped and
        both tiles appeared dead, causing incorrect aliasing.
        """

        func = _prepare_and_run_memory_reuse(self._TestProgram)

        # gate_new and up_new are both live at the yield → must NOT share MemRef
        _assert_not_shares_memref_recursive(func, "gate_new", "up_new")

    def test_init_values_prevent_aliasing_of_loop_inputs(self):
        """Two tiles used as init_values must NOT share MemRef.

        gate_init and up_init are both consumed at the loop entry point as
        init_values. Without the ForStmt init_value fix, these variables
        appeared dead and got incorrectly aliased.
        """

        func = _prepare_and_run_memory_reuse(self._TestProgram)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/ut/ir/transforms/test_basic_memory_reuse.py (1)

1054-1119: Add regression coverage for ReturnStmt and WhileStmt too.

These tests lock down YieldStmt plus ForStmt::init_values, but the implementation change also adds explicit ReturnStmt handling and WhileStmt::initValue_ collection. A small direct-tile-return case and a while-loop case would keep those new branches from regressing silently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ut/ir/transforms/test_basic_memory_reuse.py` around lines 1054 - 1119,
Add two regression tests mirroring the ForStmt/YieldStmt cases to cover the new
ReturnStmt and WhileStmt handling: (1) a test that returns a direct tile from
the function (e.g., create a Before program whose main loads a tile and
immediately returns it) and asserts via _prepare_and_run_memory_reuse and
_assert_not_shares_memref_recursive that a returned tile does not share MemRef
with another live tile; and (2) a test that uses a while-loop with init_value_
semantics (construct a Before program using pl.while or similar with init values
consumed at loop entry, perform operations producing two tiles that are live at
entry/yield, run through _prepare_and_run_memory_reuse, and assert the two init
tiles do not share MemRef). Use the same helper symbols from the file
(_prepare_and_run_memory_reuse, _assert_not_shares_memref_recursive,
pl.range/pl.yield_ style for guidance) and follow the existing test structure
and naming pattern so ReturnStmt and WhileStmt branches are exercised.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/ut/ir/transforms/test_basic_memory_reuse.py`:
- Line 1080: Rename the unused yielded variable up_out to _up_out in the test so
Ruff stops flagging it; specifically update the assignment "gate_out, up_out =
pl.yield_(gate_new, up_new)" (and the other occurrence at the second instance)
to "gate_out, _up_out = pl.yield_(gate_new, up_new)" to keep test behavior
identical while marking the value as intentionally unused.

---

Nitpick comments:
In `@tests/ut/ir/transforms/test_basic_memory_reuse.py`:
- Around line 1054-1119: Add two regression tests mirroring the
ForStmt/YieldStmt cases to cover the new ReturnStmt and WhileStmt handling: (1)
a test that returns a direct tile from the function (e.g., create a Before
program whose main loads a tile and immediately returns it) and asserts via
_prepare_and_run_memory_reuse and _assert_not_shares_memref_recursive that a
returned tile does not share MemRef with another live tile; and (2) a test that
uses a while-loop with init_value_ semantics (construct a Before program using
pl.while or similar with init values consumed at loop entry, perform operations
producing two tiles that are live at entry/yield, run through
_prepare_and_run_memory_reuse, and assert the two init tiles do not share
MemRef). Use the same helper symbols from the file
(_prepare_and_run_memory_reuse, _assert_not_shares_memref_recursive,
pl.range/pl.yield_ style for guidance) and follow the existing test structure
and naming pattern so ReturnStmt and WhileStmt branches are exercised.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 99dc5f83-a477-43ce-b837-784225def566

📥 Commits

Reviewing files that changed from the base of the PR and between 0905ede and 735ea07.

📒 Files selected for processing (2)
  • src/ir/transforms/basic_memory_reuse_pass.cpp
  • tests/ut/ir/transforms/test_basic_memory_reuse.py

chunk: pl.Tile[[64, 64], pl.FP32] = pl.load(input_a, [0, 0], [64, 64])
gate_new: pl.Tile[[64, 64], pl.FP32] = pl.add(gate_acc, chunk)
up_new: pl.Tile[[64, 64], pl.FP32] = pl.add(up_acc, chunk)
gate_out, up_out = pl.yield_(gate_new, up_new)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Rename the unused yielded value to satisfy Ruff.

up_out is never read in either test, so Ruff will keep flagging Lines 1080 and 1112. Prefix it with _ to preserve the test shape without suppressing the rule.

Suggested cleanup
-                    gate_out, up_out = pl.yield_(gate_new, up_new)
+                    gate_out, _up_out = pl.yield_(gate_new, up_new)
...
-                    gate_out, up_out = pl.yield_(gate_new, up_new)
+                    gate_out, _up_out = pl.yield_(gate_new, up_new)

Also applies to: 1112-1112

🧰 Tools
🪛 Ruff (0.15.6)

[warning] 1080-1080: Unpacked variable up_out is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ut/ir/transforms/test_basic_memory_reuse.py` at line 1080, Rename the
unused yielded variable up_out to _up_out in the test so Ruff stops flagging it;
specifically update the assignment "gate_out, up_out = pl.yield_(gate_new,
up_new)" (and the other occurrence at the second instance) to "gate_out, _up_out
= pl.yield_(gate_new, up_new)" to keep test behavior identical while marking the
value as intentionally unused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] MemoryReuse aliases gate_acc and up_acc in qwen3_decode_layer_incore_8_aiv

1 participant