Skip to content

restore ds back#23

Merged
Inspiron-st merged 1 commit intohw-native-sys:mainfrom
Inspiron-st:dev
Mar 19, 2026
Merged

restore ds back#23
Inspiron-st merged 1 commit intohw-native-sys:mainfrom
Inspiron-st:dev

Conversation

@Inspiron-st
Copy link
Collaborator

@Inspiron-st Inspiron-st commented Mar 19, 2026

use correct backend_type
change tile size

Summary by CodeRabbit

Release Notes

  • Performance Optimization

    • Modified tiling parameters for attention and MLP operations to improve computation efficiency.
    • Refined buffer data handling in attention output combination logic.
  • Platform Support

    • Updated compute backend to Ascend950 for enhanced hardware compatibility.

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在修复 deepseek_v3_2_decode_back.py 文件中的几个关键问题,以优化模型解码过程的正确性和性能。主要通过更新编译后端类型、调整内部计算的切片大小以及修正张量切片和重塑逻辑来达成此目的。

Highlights

  • 后端类型修正: 将编译运行的后端类型从 BackendType.CCE 更改为 BackendType.Ascend950,以确保使用正确的硬件后端。
  • 切片大小调整: 调整了 Q_OUT_CHUNKMLP_OUT_CHUNK 的切片大小,从 128 降至 64,从 512 降至 64
  • 张量操作修正: 修正了 deepseek_v3_2_decode_back_layer 函数中 combine_buf 的切片维度,并添加了重塑操作以确保张量形状正确。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Mar 19, 2026

📝 Walkthrough

Walkthrough

Configuration and parameter optimization for DeepSeek V3.2 decoding: reduced tiling chunk sizes for O-projection and MLP outputs, adjusted tensor slicing logic in the decode backward layer, and migrated backend from CCE to Ascend950.

Changes

Cohort / File(s) Summary
DeepSeek V3.2 Decode Configuration
examples/deepseek_v3_2/deepseek_v3_2_decode_back.py
Reduced Q_OUT_CHUNK (128→64) and MLP_OUT_CHUNK (512→64) affecting tiling granularity; modified tensor reading in deepseek_v3_2_decode_back_layer with adjusted slice shape [node_id, 1, ATTN_OUT_CFG] and reshape to [1, ATTN_OUT_CFG]; switched RunConfig backend from BackendType.CCE to BackendType.Ascend950.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 A rabbit hops through silicon streams,
Adjusting chunks to optimize dreams,
From CCE to Ascend, the backend now gleams,
Tiling refined with precision and schemes,
DeepSeek's decoder dreams, faster it seems! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'restore ds back' is vague and does not clearly convey the specific changes made (backend type correction, tile size adjustment, and tensor reshaping modifications). Use a more descriptive title that specifies the main changes, such as 'Fix deepseek_v3_2 backend and tile size configuration' or 'Update deepseek_v3_2 backend type and optimize tiling granularity'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

本次PR修复了 deepseek_v3_2_decode_back.py 文件中的一些问题,包括更新 backend_type 和调整 tile size。代码审查发现了一些可以改进的地方:一处是关于 tile size 的注释与代码不一致,可能会引起困惑;另一处是 work_dir 参数在 RunConfig 中被移除后,没有被传递给 run 函数,这可能是一个bug,会导致输出文件路径不正确。请查看具体的审查意见。

strategy=OptimizationStrategy.Default,
dump_passes=dump_passes,
backend_type=BackendType.CCE,
work_dir=work_dir,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

work_dir 参数已从此处的 RunConfig 中移除,但它并未在第224行传递给 run 函数。然而,work_dir 变量在第221-222行被计算,并在后续的打印语句(如第240行)中使用,这表明其意图仍是指定输出目录。若不将其传递给 run 函数,输出文件很可能会被放置在默认位置,从而导致打印信息产生误导。这似乎是一个需要修复的缺陷。如果API已更改,work_dir 可能需要作为关键字参数直接传递给 run()

Comment on lines +40 to +41
Q_OUT_CHUNK = 64
MLP_OUT_CHUNK = 64

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

这里的 Q_OUT_CHUNKMLP_OUT_CHUNK 的值被减小了,但是第37-38行的注释却说 ‘Increase tile sizes...’ (增加tile大小...)。这与代码的实际行为相矛盾,可能会让后续的开发者感到困惑。请考虑更新注释以反映这些值是为性能而调整的,或者删除注释中容易引起误解的部分。

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
examples/deepseek_v3_2/deepseek_v3_2_decode_back.py (1)

221-236: ⚠️ Potential issue | 🟠 Major

Pass work_dir to RunConfig to honor caller-provided or computed dump paths.

work_dir is accepted in the function signature and computed (lines 221–222), and later logged in print statements (lines 240, 244, 246), but it is never passed to RunConfig(). Other files in the codebase using BackendType.CCE (e.g., deepseek_v3_2_prefill_back.py, qwen3_32b_prefill.py) correctly pass work_dir=work_dir to RunConfig. Without passing it here, the dump location specified by callers is silently discarded, making the logs misleading.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/deepseek_v3_2/deepseek_v3_2_decode_back.py` around lines 221 - 236,
The function computes or accepts work_dir but fails to pass it into RunConfig,
so caller-provided or computed dump paths are ignored; update the call to
run(...) so the RunConfig constructed for this invocation (the RunConfig(...)
passed into run with BackendType.Ascend950) includes work_dir=work_dir,
mirroring other files (e.g., deepseek_v3_2_prefill_back.py) so dumps and printed
log paths match the real dump directory.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/deepseek_v3_2/deepseek_v3_2_decode_back.py`:
- Around line 37-41: The comment above the constants K_CHUNK, Q_OUT_CHUNK, and
MLP_OUT_CHUNK is misleading: update the comment to accurately describe the new
values (K_CHUNK = 512 while Q_OUT_CHUNK and MLP_OUT_CHUNK are 64) and why they
were chosen (e.g., increased K chunk for larger fusion regions while Q/MLP
outputs were reduced for performance/tiling trade-offs), so future kernel
tuning/debugging reflects the actual constants K_CHUNK, Q_OUT_CHUNK, and
MLP_OUT_CHUNK.

---

Outside diff comments:
In `@examples/deepseek_v3_2/deepseek_v3_2_decode_back.py`:
- Around line 221-236: The function computes or accepts work_dir but fails to
pass it into RunConfig, so caller-provided or computed dump paths are ignored;
update the call to run(...) so the RunConfig constructed for this invocation
(the RunConfig(...) passed into run with BackendType.Ascend950) includes
work_dir=work_dir, mirroring other files (e.g., deepseek_v3_2_prefill_back.py)
so dumps and printed log paths match the real dump directory.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 59b8bdbe-d617-4628-8936-c345d876b879

📥 Commits

Reviewing files that changed from the base of the PR and between 6b865d2 and 504243a.

📒 Files selected for processing (1)
  • examples/deepseek_v3_2/deepseek_v3_2_decode_back.py

Comment on lines 37 to +41
# Increase tile sizes to encourage larger mixed-kernel fusion regions
# (notably for decode_back_layer_incore_0/1).
K_CHUNK = 512
Q_OUT_CHUNK = 128
MLP_OUT_CHUNK = 512
Q_OUT_CHUNK = 64
MLP_OUT_CHUNK = 64
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update the tile-size comment to match the new constants.

Line 37 says tile sizes were increased, but Line 40 and Line 41 reduce them to 64. This is misleading for future kernel tuning/debugging.

Suggested fix
-# Increase tile sizes to encourage larger mixed-kernel fusion regions
-# (notably for decode_back_layer_incore_0/1).
+# Reduce output tile sizes to improve decode kernel scheduling/fusion behavior
+# (notably for decode_back_layer_incore_0/1).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/deepseek_v3_2/deepseek_v3_2_decode_back.py` around lines 37 - 41,
The comment above the constants K_CHUNK, Q_OUT_CHUNK, and MLP_OUT_CHUNK is
misleading: update the comment to accurately describe the new values (K_CHUNK =
512 while Q_OUT_CHUNK and MLP_OUT_CHUNK are 64) and why they were chosen (e.g.,
increased K chunk for larger fusion regions while Q/MLP outputs were reduced for
performance/tiling trade-offs), so future kernel tuning/debugging reflects the
actual constants K_CHUNK, Q_OUT_CHUNK, and MLP_OUT_CHUNK.

@Inspiron-st Inspiron-st changed the title 修复ds back文件 restore ds back Mar 19, 2026
@Inspiron-st Inspiron-st merged commit d2c6e07 into hw-native-sys:main Mar 19, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants