restore ds back by Inspiron-st · Pull Request #23 · hw-native-sys/pypto-lib

Inspiron-st · 2026-03-19T09:35:33Z

use correct backend_type
change tile size

Summary by CodeRabbit

Release Notes

Performance Optimization
- Modified tiling parameters for attention and MLP operations to improve computation efficiency.
- Refined buffer data handling in attention output combination logic.
Platform Support
- Updated compute backend to Ascend950 for enhanced hardware compatibility.

gemini-code-assist · 2026-03-19T09:35:49Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在修复 deepseek_v3_2_decode_back.py 文件中的几个关键问题，以优化模型解码过程的正确性和性能。主要通过更新编译后端类型、调整内部计算的切片大小以及修正张量切片和重塑逻辑来达成此目的。

Highlights

后端类型修正: 将编译运行的后端类型从 BackendType.CCE 更改为 BackendType.Ascend950，以确保使用正确的硬件后端。
切片大小调整: 调整了 Q_OUT_CHUNK 和 MLP_OUT_CHUNK 的切片大小，从 128 降至 64，从 512 降至 64。
张量操作修正: 修正了 deepseek_v3_2_decode_back_layer 函数中 combine_buf 的切片维度，并添加了重塑操作以确保张量形状正确。

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-03-19T09:35:55Z

📝 Walkthrough

Walkthrough

Configuration and parameter optimization for DeepSeek V3.2 decoding: reduced tiling chunk sizes for O-projection and MLP outputs, adjusted tensor slicing logic in the decode backward layer, and migrated backend from CCE to Ascend950.

Changes

Cohort / File(s)	Summary
DeepSeek V3.2 Decode Configuration `examples/deepseek_v3_2/deepseek_v3_2_decode_back.py`	Reduced `Q_OUT_CHUNK` (128→64) and `MLP_OUT_CHUNK` (512→64) affecting tiling granularity; modified tensor reading in `deepseek_v3_2_decode_back_layer` with adjusted slice shape `[node_id, 1, ATTN_OUT_CFG]` and reshape to `[1, ATTN_OUT_CFG]`; switched `RunConfig` backend from `BackendType.CCE` to `BackendType.Ascend950`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 A rabbit hops through silicon streams,
Adjusting chunks to optimize dreams,
From CCE to Ascend, the backend now gleams,
Tiling refined with precision and schemes,
DeepSeek's decoder dreams, faster it seems! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'restore ds back' is vague and does not clearly convey the specific changes made (backend type correction, tile size adjustment, and tensor reshaping modifications).	Use a more descriptive title that specifies the main changes, such as 'Fix deepseek_v3_2 backend and tile size configuration' or 'Update deepseek_v3_2 backend type and optimize tiling granularity'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

本次PR修复了 deepseek_v3_2_decode_back.py 文件中的一些问题，包括更新 backend_type 和调整 tile size。代码审查发现了一些可以改进的地方：一处是关于 tile size 的注释与代码不一致，可能会引起困惑；另一处是 work_dir 参数在 RunConfig 中被移除后，没有被传递给 run 函数，这可能是一个bug，会导致输出文件路径不正确。请查看具体的审查意见。

gemini-code-assist · 2026-03-19T09:37:18Z

examples/deepseek_v3_2/deepseek_v3_2_decode_back.py

            strategy=OptimizationStrategy.Default,
            dump_passes=dump_passes,
-            backend_type=BackendType.CCE,
-            work_dir=work_dir,


work_dir 参数已从此处的 RunConfig 中移除，但它并未在第224行传递给 run 函数。然而，work_dir 变量在第221-222行被计算，并在后续的打印语句（如第240行）中使用，这表明其意图仍是指定输出目录。若不将其传递给 run 函数，输出文件很可能会被放置在默认位置，从而导致打印信息产生误导。这似乎是一个需要修复的缺陷。如果API已更改，work_dir 可能需要作为关键字参数直接传递给 run()。

gemini-code-assist · 2026-03-19T09:37:18Z

examples/deepseek_v3_2/deepseek_v3_2_decode_back.py

+Q_OUT_CHUNK = 64
+MLP_OUT_CHUNK = 64


这里的 Q_OUT_CHUNK 和 MLP_OUT_CHUNK 的值被减小了，但是第37-38行的注释却说 ‘Increase tile sizes...’ (增加tile大小...)。这与代码的实际行为相矛盾，可能会让后续的开发者感到困惑。请考虑更新注释以反映这些值是为性能而调整的，或者删除注释中容易引起误解的部分。

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

examples/deepseek_v3_2/deepseek_v3_2_decode_back.py (1)
221-236: ⚠️ Potential issue | 🟠 Major

Pass work_dir to RunConfig to honor caller-provided or computed dump paths.

work_dir is accepted in the function signature and computed (lines 221–222), and later logged in print statements (lines 240, 244, 246), but it is never passed to RunConfig(). Other files in the codebase using BackendType.CCE (e.g., deepseek_v3_2_prefill_back.py, qwen3_32b_prefill.py) correctly pass work_dir=work_dir to RunConfig. Without passing it here, the dump location specified by callers is silently discarded, making the logs misleading.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/deepseek_v3_2/deepseek_v3_2_decode_back.py` around lines 221 - 236,
The function computes or accepts work_dir but fails to pass it into RunConfig,
so caller-provided or computed dump paths are ignored; update the call to
run(...) so the RunConfig constructed for this invocation (the RunConfig(...)
passed into run with BackendType.Ascend950) includes work_dir=work_dir,
mirroring other files (e.g., deepseek_v3_2_prefill_back.py) so dumps and printed
log paths match the real dump directory.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/deepseek_v3_2/deepseek_v3_2_decode_back.py`:
- Around line 37-41: The comment above the constants K_CHUNK, Q_OUT_CHUNK, and
MLP_OUT_CHUNK is misleading: update the comment to accurately describe the new
values (K_CHUNK = 512 while Q_OUT_CHUNK and MLP_OUT_CHUNK are 64) and why they
were chosen (e.g., increased K chunk for larger fusion regions while Q/MLP
outputs were reduced for performance/tiling trade-offs), so future kernel
tuning/debugging reflects the actual constants K_CHUNK, Q_OUT_CHUNK, and
MLP_OUT_CHUNK.

---

Outside diff comments:
In `@examples/deepseek_v3_2/deepseek_v3_2_decode_back.py`:
- Around line 221-236: The function computes or accepts work_dir but fails to
pass it into RunConfig, so caller-provided or computed dump paths are ignored;
update the call to run(...) so the RunConfig constructed for this invocation
(the RunConfig(...) passed into run with BackendType.Ascend950) includes
work_dir=work_dir, mirroring other files (e.g., deepseek_v3_2_prefill_back.py)
so dumps and printed log paths match the real dump directory.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 59b8bdbe-d617-4628-8936-c345d876b879

📥 Commits

Reviewing files that changed from the base of the PR and between 6b865d2 and 504243a.

📒 Files selected for processing (1)

examples/deepseek_v3_2/deepseek_v3_2_decode_back.py

coderabbitai · 2026-03-19T09:42:09Z

examples/deepseek_v3_2/deepseek_v3_2_decode_back.py

 # Increase tile sizes to encourage larger mixed-kernel fusion regions
 # (notably for decode_back_layer_incore_0/1).
 K_CHUNK = 512
-Q_OUT_CHUNK = 128
-MLP_OUT_CHUNK = 512
+Q_OUT_CHUNK = 64
+MLP_OUT_CHUNK = 64


⚠️ Potential issue | 🟡 Minor

Update the tile-size comment to match the new constants.

Line 37 says tile sizes were increased, but Line 40 and Line 41 reduce them to 64. This is misleading for future kernel tuning/debugging.

Suggested fix

-# Increase tile sizes to encourage larger mixed-kernel fusion regions -# (notably for decode_back_layer_incore_0/1). +# Reduce output tile sizes to improve decode kernel scheduling/fusion behavior +# (notably for decode_back_layer_incore_0/1).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/deepseek_v3_2/deepseek_v3_2_decode_back.py` around lines 37 - 41, The comment above the constants K_CHUNK, Q_OUT_CHUNK, and MLP_OUT_CHUNK is misleading: update the comment to accurately describe the new values (K_CHUNK = 512 while Q_OUT_CHUNK and MLP_OUT_CHUNK are 64) and why they were chosen (e.g., increased K chunk for larger fusion regions while Q/MLP outputs were reduced for performance/tiling trade-offs), so future kernel tuning/debugging reflects the actual constants K_CHUNK, Q_OUT_CHUNK, and MLP_OUT_CHUNK.

restore ds back

504243a

Inspiron-st force-pushed the dev branch from 654ddc0 to 504243a Compare March 19, 2026 09:36

gemini-code-assist bot reviewed Mar 19, 2026

View reviewed changes

coderabbitai bot reviewed Mar 19, 2026

View reviewed changes

Inspiron-st changed the title ~~修复ds back文件~~ restore ds back Mar 19, 2026

Inspiron-st merged commit d2c6e07 into hw-native-sys:main Mar 19, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

restore ds back#23

restore ds back#23
Inspiron-st merged 1 commit intohw-native-sys:mainfrom
Inspiron-st:dev

Inspiron-st commented Mar 19, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

gemini-code-assist bot commented Mar 19, 2026

Uh oh!

coderabbitai bot commented Mar 19, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 19, 2026

Uh oh!

gemini-code-assist bot Mar 19, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Inspiron-st commented Mar 19, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

gemini-code-assist bot commented Mar 19, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Inspiron-st commented Mar 19, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 19, 2026 •

edited

Loading