test(st): Add dynamic-shape paged attention and emit layout attrs by Crystal-wzy · Pull Request #604 · hw-native-sys/pypto

Crystal-wzy · 2026-03-19T01:39:50Z

Summary

Add examples/ir_parser/dynamic_paged_attention_example.py with build_dynamic_paged_attention_program() builder that defines five InCore kernel closures (dyn_kernel_init_inplace, dyn_kernel_qk_matmul, dyn_kernel_softmax_prepare, dyn_kernel_pv_matmul, dyn_kernel_online_update); type annotations use module-level pl.dynamic() variables (Q_HEADS, HEAD_DIM_DYN, BLOCK_SIZE_DYN), load ops use concrete closure variables (_Q_TILE, _HEAD_DIM, _BLOCK_SIZE)
Add tests/st/codegen/test_dynamic_paged_attention.py with DynamicPagedAttentionTestCase inheriting golden reference and tensor definitions from PagedAttentionTestCase, targeting Ascend910B PTO backend; 4 parametrized configurations (including one with per-request variable context lengths)
Extend tests/st/codegen/test_paged_attention.py: PagedAttentionTestCase accepts context_len: int | list[int] and constructs context_lens tensor from a list when heterogeneous per-request lengths are needed (required by DynamicPagedAttentionTestCase)
Fix PTOCodegen::EmitMakeTensorViews to emit {layout = #pto.layout<nd|dn|nz>} attribute on tensor view assignments so DN-layout tensors (e.g. transposed key_cache in paged attention) are correctly represented in generated PTO IR
Fix examples/ir_parser/paged_attention_example.py: annotate [16, 1] output tensors in kernel_init_inplace and kernel_softmax_prepare with pl.DN; remove pl.DN from kj parameter in kernel_qk_matmul (key_cache is ND, not DN)

Testing

DynamicPagedAttentionTestCase passes on 910B PTO hardware
Pre-commit hooks pass (ruff, pyright, clang-format, cpplint)

coderabbitai · 2026-03-19T01:40:03Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a new dynamic-shape paged-attention example and integration test, updates an existing paged-attention example's kernel type annotations and orchestration signature, and modifies PTO codegen tensor-view/stride emission and type/layout printing.

Changes

Cohort / File(s)	Summary
Dynamic Paged-Attention Example `examples/ir_parser/dynamic_paged_attention_example.py`	New example that builds a dynamic-shape, paged-attention `@pl.program` with three `pl.dynamic(...)` variables and five InCore kernels wired into a `DynamicPagedAttentionProgram.paged_attention(...)` orchestration.
Paged-Attention Example Edits `examples/ir_parser/paged_attention_example.py`	Updated InCore kernel tensor annotations to add `pl.DN` on `li`/`mi` accumulator outputs and removed `pl.DN` from `kj` in qk matmul; removed three size parameters from `PagedAttentionProgram.paged_attention` and stopped returning their TensorSpec entries.
PTO Codegen `src/codegen/pto/pto_codegen.cpp`	Removed N‑D row‑stride precomputation and simplified stride emission (explicit rank‑2 and rank‑1 handling); emit layout metadata in `!pto.tensor_view<...>` annotations with DN/NZ/ND mapping; narrowed index-constant emission conditions.
Tests — Dynamic & Base `tests/st/codegen/test_dynamic_paged_attention.py`, `tests/st/codegen/test_paged_attention.py`	Added `test_dynamic_paged_attention.py` with `DynamicPagedAttentionTestCase` and parameterized runs; `PagedAttentionTestCase` now accepts `context_len: int

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat(codegen): Add 910B PTO backend op support for paged attention #195: Overlapping PTO backend/codegen and paged-attention kernel additions — related to paged-attention kernel generation.
feat(codegen): Add PTOAS-aware type annotations and update PTO naming #267: Changes to PTO tensor-view/type emission and layout printing similar to the edits in pto_codegen.cpp.
test(st): Add Paged Attention system tests for four PTO kernels #257: Prior paged-attention test and kernel work that this change extends with dynamic-shape variants and test updates.

Suggested reviewers

Hzfengsy

Poem

🐰 In pages of queries and blocks I leap,
Five tiny kernels wake from sleep,
Shapes bend and stride rules softly shift,
Tests hop in to give them a lift —
A rabbit cheers: "New attention, neat!" 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.87% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'test(st): Add dynamic-shape paged attention and emit layout attrs' accurately summarizes the main changes: adding dynamic-shape paged attention tests and emitting layout attributes in code generation.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, detailing the new files, modifications, and fixes across all changed components with testing information.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-19T01:40:11Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the PyPTO framework by introducing robust support for dynamic-shape paged attention. It provides a new example demonstrating how InCore kernels can utilize dynamic variables for tensor shapes, enabling greater flexibility in handling varying input dimensions at runtime. Concurrently, it addresses a critical code generation issue by ensuring that the PTO backend correctly emits layout attributes for tensor views, which is essential for accurate representation of data layouts like DN-layout in the generated IR. These changes collectively improve the framework's capability to handle more complex and dynamic deep learning models, particularly in attention mechanisms.

Highlights

Dynamic-Shape Paged Attention Example: A new example, examples/ir_parser/dynamic_paged_attention_example.py, was added to demonstrate dynamic-shape paged attention. This includes a builder function build_dynamic_paged_attention_program() that defines five InCore kernel closures, utilizing module-level pl.dynamic() variables for type annotations and concrete closure variables for load operations.
Dynamic Paged Attention Tests: New system tests were introduced in tests/st/codegen/test_dynamic_paged_attention.py to validate the dynamic-shape paged attention. These tests include four parametrized configurations, one of which supports per-request variable context lengths, targeting the Ascend910B PTO backend.
Enhanced Paged Attention Test Case: The existing tests/st/codegen/test_paged_attention.py was extended to allow PagedAttentionTestCase to accept context_len as either an integer or a list of integers. This enables the construction of context_lens tensors from a list, supporting heterogeneous per-request lengths required by the new dynamic paged attention tests.
PTO Codegen Layout Attribute Fix: A fix was implemented in PTOCodegen::EmitMakeTensorViews to ensure that {layout = #pto.layout<nd|dn|nz>} attributes are correctly emitted on tensor view assignments. This is crucial for accurately representing DN-layout tensors, such as transposed key_cache in paged attention, within the generated PTO IR.
Paged Attention Example Annotations Refinement: The examples/ir_parser/paged_attention_example.py file was updated to correctly annotate [16, 1] output tensors in kernel_init_inplace and kernel_softmax_prepare with pl.DN layout. Additionally, the pl.DN annotation was removed from the kj parameter in kernel_qk_matmul, as key_cache is ND, not DN.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request successfully introduces dynamic-shape paged attention, enhancing the flexibility of the PyPTO framework. The new dynamic_paged_attention_example.py demonstrates the usage of dynamic shapes in InCore kernels, and the accompanying test_dynamic_paged_attention.py provides comprehensive test coverage, including configurations with variable context lengths. The fix in src/codegen/pto/pto_codegen.cpp to correctly emit layout attributes for pto.make_tensor_view is crucial for accurate representation of DN-layout tensors in the generated IR. Overall, the changes are well-implemented and align with the stated objectives of the pull request. The recurring type-checking issues highlighted in the example file should be addressed by refining type annotations or the DSL's API, aligning with best practices for DSL argument validation.

examples/ir_parser/dynamic_paged_attention_example.py

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/codegen/pto/pto_codegen.cpp (1)
351-357: ⚠️ Potential issue | 🟠 Major

Restore the generic row-major stride path for rank > 2 tensors.

After this split, any tensor parameter with rank greater than 2 now falls through to strides = []. pto.make_tensor_view used to receive a full N-D row-major stride vector here, so this is a silent regression for every 3D+ entrypoint.

Also applies to: 440-457
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/codegen/pto/pto_codegen.cpp` around lines 351 - 357, The change removed
the generic N-D row-major stride emission for tensors with rank > 2, causing
strides to be left as an empty vector for 3D+ tensors; restore the original
generic path by computing and emitting row-major strides for all ranks > 1 (not
just 2), using tensor_type->shape_ to compute cumulative product of trailing
dimensions (use GetConstIntValue and GetOrEmitIndexConstant to emit each stride
value) so that pto.make_tensor_view receives a full N-D stride vector; apply the
same fix in the corresponding mirrored block around the later copy (the block
that currently handles shape sizes 2 and 1 at the other location).

🧹 Nitpick comments (1)

tests/st/codegen/test_dynamic_paged_attention.py (1)

89-112: Consider pinning the emitted #pto.layout<...> text in a codegen-only check.

These STs validate numerics on hardware, but they do not lock down the exact EmitMakeTensorViews regression this PR fixes. A small codegen-only assertion on the generated PTO IR would catch attribute-loss regressions much earlier.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/st/codegen/test_dynamic_paged_attention.py` around lines 89 - 112, Add
a codegen-only assertion in test_dynamic_paged_attention to pin the emitted PTO
IR layout text (the "#pto.layout<...>" emitted by the EmitMakeTensorViews
regression) so attribute-loss regressions are caught; after calling
test_runner.run(test_case) (and before numeric asserts), retrieve the generated
PTO/IR string from the test result or test_runner API (e.g., a field on result
or a helper that returns the generated IR), and assert that it contains the
expected "#pto.layout<...>" fragment (or a small canonical layout pattern) for
the tensors involved in DynamicPagedAttentionTestCase, failing the test if the
fragment is missing. Ensure you reference the EmitMakeTensorViews-related layout
pattern and keep this check confined to codegen-only (skip on hardware
numeric-only runs if necessary).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/ir_parser/dynamic_paged_attention_example.py`:
- Around line 83-90: The dynamic path currently types the mi/li accumulators as
plain ND tensors in dyn_kernel_init_inplace (and the similar signatures later),
which mismatches the static paged-attention builder that marks these as pl.DN to
indicate DN-shaped layout; update the function signatures (e.g.,
dyn_kernel_init_inplace and the other dynamic kernel init/online-update
signatures referenced) to annotate mi and li as pl.DN[pl.Tensor[[Q_HEADS, 1],
pl.FP32]] (or the equivalent pl.DN wrapper used in the repo) instead of plain
pl.Tensor so the downstream dyn_kernel_online_update consumers see the same
DN-shaped layout semantics. Ensure all other dynamic-kernel declarations noted
in the comment are changed consistently.

In `@tests/st/codegen/test_paged_attention.py`:
- Around line 419-422: The code builds context_lens from self.context_len but
doesn't validate when self.context_len is a list: ensure that in the branch
handling list-valued self.context_len you check that len(self.context_len) == B
(the batch size) and raise a clear ValueError if it differs (or alternatively
broadcast/trim only if that behavior is intended); update the logic around the
context_lens creation (the branch that sets context_lens =
torch.tensor(self.context_len, dtype=torch.int32)) to perform this length check
so the created Tensor matches the declared TensorSpec("context_lens", [B], ...).

---

Outside diff comments:
In `@src/codegen/pto/pto_codegen.cpp`:
- Around line 351-357: The change removed the generic N-D row-major stride
emission for tensors with rank > 2, causing strides to be left as an empty
vector for 3D+ tensors; restore the original generic path by computing and
emitting row-major strides for all ranks > 1 (not just 2), using
tensor_type->shape_ to compute cumulative product of trailing dimensions (use
GetConstIntValue and GetOrEmitIndexConstant to emit each stride value) so that
pto.make_tensor_view receives a full N-D stride vector; apply the same fix in
the corresponding mirrored block around the later copy (the block that currently
handles shape sizes 2 and 1 at the other location).

---

Nitpick comments:
In `@tests/st/codegen/test_dynamic_paged_attention.py`:
- Around line 89-112: Add a codegen-only assertion in
test_dynamic_paged_attention to pin the emitted PTO IR layout text (the
"#pto.layout<...>" emitted by the EmitMakeTensorViews regression) so
attribute-loss regressions are caught; after calling test_runner.run(test_case)
(and before numeric asserts), retrieve the generated PTO/IR string from the test
result or test_runner API (e.g., a field on result or a helper that returns the
generated IR), and assert that it contains the expected "#pto.layout<...>"
fragment (or a small canonical layout pattern) for the tensors involved in
DynamicPagedAttentionTestCase, failing the test if the fragment is missing.
Ensure you reference the EmitMakeTensorViews-related layout pattern and keep
this check confined to codegen-only (skip on hardware numeric-only runs if
necessary).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d36470db-25be-4743-8025-90fa2590573f

📥 Commits

Reviewing files that changed from the base of the PR and between c0a17f9 and e84dd9b.

📒 Files selected for processing (5)

examples/ir_parser/dynamic_paged_attention_example.py
examples/ir_parser/paged_attention_example.py
src/codegen/pto/pto_codegen.cpp
tests/st/codegen/test_dynamic_paged_attention.py
tests/st/codegen/test_paged_attention.py

examples/ir_parser/dynamic_paged_attention_example.py

tests/st/codegen/test_paged_attention.py

coderabbitai

🧹 Nitpick comments (1)

tests/st/codegen/test_dynamic_paged_attention.py (1)

89-97: Add at least one non-block-aligned heterogeneous context length.

The current variable-length case uses [512, 4096, 8192, 768], all exact multiples of block_size=128, so it never exercises the partial-tail block path (valid_len < block_size). A non-multiple here would give this new coverage real teeth.

Suggested tweak

-            # Variable context lengths: each of 4 requests has a different length
-            (4, 16, 128, 128, [512, 4096, 8192, 768], 32768),
+            # Variable context lengths, including partial last blocks
+            (4, 16, 128, 128, [513, 4097, 8192, 770], 32768),

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/st/codegen/test_dynamic_paged_attention.py` around lines 89 - 97, The
variable-length param case in the pytest.mark.parametrize tuple uses context_len
= [512, 4096, 8192, 768], which are all multiples of block_size (128) and thus
never exercises the partial-tail block path; update the heterogeneous context
length list in the parameter tuple (the case passed to pytest.mark.parametrize)
so at least one entry is not a multiple of block_size (e.g., change 768 to 770
or another non-multiple) so the test for dynamic_paged_attention covers the
partial-tail scenario.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/st/codegen/test_dynamic_paged_attention.py`:
- Around line 89-97: The variable-length param case in the
pytest.mark.parametrize tuple uses context_len = [512, 4096, 8192, 768], which
are all multiples of block_size (128) and thus never exercises the partial-tail
block path; update the heterogeneous context length list in the parameter tuple
(the case passed to pytest.mark.parametrize) so at least one entry is not a
multiple of block_size (e.g., change 768 to 770 or another non-multiple) so the
test for dynamic_paged_attention covers the partial-tail scenario.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5234711b-59d6-4b69-967b-d38085b28bfe

📥 Commits

Reviewing files that changed from the base of the PR and between e84dd9b and 7701d5d.

📒 Files selected for processing (5)

examples/ir_parser/dynamic_paged_attention_example.py
examples/ir_parser/paged_attention_example.py
src/codegen/pto/pto_codegen.cpp
tests/st/codegen/test_dynamic_paged_attention.py
tests/st/codegen/test_paged_attention.py

🚧 Files skipped from review as they are similar to previous changes (3)

tests/st/codegen/test_paged_attention.py
examples/ir_parser/dynamic_paged_attention_example.py
src/codegen/pto/pto_codegen.cpp

## Summary - Add `examples/ir_parser/dynamic_paged_attention_example.py` with `build_dynamic_paged_attention_program()` builder that defines five InCore kernel closures (`dyn_kernel_init_inplace`, `dyn_kernel_qk_matmul`, `dyn_kernel_softmax_prepare`, `dyn_kernel_pv_matmul`, `dyn_kernel_online_update`); type annotations use module-level `pl.dynamic()` variables (`Q_HEADS`, `HEAD_DIM_DYN`, `BLOCK_SIZE_DYN`), load ops use concrete closure variables (`_Q_TILE`, `_HEAD_DIM`, `_BLOCK_SIZE`) - Add `tests/st/codegen/test_dynamic_paged_attention.py` with `DynamicPagedAttentionTestCase` inheriting golden reference and tensor definitions from `PagedAttentionTestCase`, targeting Ascend910B PTO backend; 4 parametrized configurations (including one with per-request variable context lengths) - Extend `tests/st/codegen/test_paged_attention.py`: `PagedAttentionTestCase` accepts `context_len: int | list[int]` and constructs `context_lens` tensor from a list when heterogeneous per-request lengths are needed (required by `DynamicPagedAttentionTestCase`) - Fix `PTOCodegen::EmitMakeTensorViews` to emit `{layout = #pto.layout<nd|dn|nz>}` attribute on tensor view assignments so DN-layout tensors (e.g. transposed key_cache in paged attention) are correctly represented in generated PTO IR - Fix `examples/ir_parser/paged_attention_example.py`: annotate `[16, 1]` output tensors in `kernel_init_inplace` and `kernel_softmax_prepare` with `pl.DN`; remove `pl.DN` from `kj` parameter in `kernel_qk_matmul` (key_cache is ND, not DN) - Remove unused `size_query`, `size_key_cache`, `size_value_cache` parameters from orchestration function signatures in `paged_attention_example.py`; also remove their construction and `TensorSpec` entries from `build_tensor_specs` in `paged_attention_example.py` ## Testing - [x] DynamicPagedAttentionTestCase passes on 910B PTO hardware - [x] Pre-commit hooks pass (ruff, pyright, clang-format, cpplint)

gemini-code-assist bot reviewed Mar 19, 2026

View reviewed changes

examples/ir_parser/dynamic_paged_attention_example.py Show resolved Hide resolved

Crystal-wzy force-pushed the main branch from e84dd9b to 7701d5d Compare March 19, 2026 01:48

coderabbitai bot reviewed Mar 19, 2026

View reviewed changes

examples/ir_parser/dynamic_paged_attention_example.py Outdated Show resolved Hide resolved

tests/st/codegen/test_paged_attention.py Show resolved Hide resolved

Crystal-wzy force-pushed the main branch from 7701d5d to 5070690 Compare March 19, 2026 01:53

coderabbitai bot reviewed Mar 19, 2026

View reviewed changes

Crystal-wzy force-pushed the main branch from 5070690 to bc78ae6 Compare March 19, 2026 02:58

Crystal-wzy force-pushed the main branch 2 times, most recently from 6b3b68a to 427912a Compare March 19, 2026 07:54

Merge branch 'hw-native-sys:main' into main

e4b471c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(st): Add dynamic-shape paged attention and emit layout attrs#604

test(st): Add dynamic-shape paged attention and emit layout attrs#604
Crystal-wzy wants to merge 2 commits intohw-native-sys:mainfrom
Crystal-wzy:main

Crystal-wzy commented Mar 19, 2026

Uh oh!

coderabbitai bot commented Mar 19, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot commented Mar 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Crystal-wzy commented Mar 19, 2026

Summary

Testing

Uh oh!

coderabbitai bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot commented Mar 19, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Mar 19, 2026 •

edited

Loading