Skip to content

use reshape in deepseek#14

Closed
Inspiron-st wants to merge 1 commit intohw-native-sys:mainfrom
Inspiron-st:reshape_issue
Closed

use reshape in deepseek#14
Inspiron-st wants to merge 1 commit intohw-native-sys:mainfrom
Inspiron-st:reshape_issue

Conversation

@Inspiron-st
Copy link
Collaborator

@Inspiron-st Inspiron-st commented Mar 17, 2026

Summary by CodeRabbit

  • Refactor

    • Moved and consolidated internal tensor/buffer allocations and adjusted tensor shapes to streamline memory usage and forward-pass layout in the DeepSeek V3.2 decoder example.
  • Documentation

    • Added and clarified inline comments around normalization, projection, top-k and sparse-attention steps to improve readability and maintainability.

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors tensor operations within the deepseek_v3_2_decode_front_layer function by strategically applying pl.reshape to weight tensors during matrix multiplications. This change likely aims to optimize or clarify the shape handling of query and value projection weights. Additionally, the scope of several tensor allocations was adjusted, and explanatory comments were added to improve code understanding.

Highlights

  • Tensor Reshaping for Weight Tensors: Introduced pl.reshape operations when slicing weight tensors (w_q_nope_to_latent, w_latent_to_v) before matrix multiplications, ensuring correct shape handling for improved tensor processing.
  • Tensor Allocation Scope Adjustment: Moved the initialization of several intermediate tensors (qr, q_proj, kv_a, attn_front) outside pl.auto_incore() blocks, potentially affecting memory management or scope of these tensors.
  • Code Clarity and Comments: Added new comments to explain specific tensor operations, the purpose of certain variables, and the logic behind parts of the attention mechanism, enhancing code readability.
Changelog
  • examples/deepseek_v3_2_decode_front.py
    • Moved tensor initializations for qr, q_proj, and kv_a outside the pl.auto_incore context.
    • Added explanatory comments for inv_rms, qr, q_proj, kv_a calculations, and top-k initialization.
    • Moved attn_front tensor initialization outside the pl.auto_incore context.
    • Applied pl.reshape to w_q_nope_to_latent and w_latent_to_v slices before matrix multiplications.
Activity
  • No human activity (comments, reviews) has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Mar 17, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This change relocates several pre-allocated buffers (qr, q_proj, kv_a) outside the inner auto_incore block, adjusts leading-dimension handling and reshapes for q/k/v and wv tiles, and adds extensive inline comments around RMSNorm, projection, top-k buffers, and sparse-attention merge logic in the DeepSeek V3.2 decode front example.

Changes

Cohort / File(s) Summary
DeepSeek V3.2 Decode Front Updates
examples/deepseek_v3_2_decode_front.py
Moved buffer allocations (qr, q_proj, kv_a) out of the inner auto_incore block; created attn_front prior to the block; added/updated comments for RMSNorm and projection paths (qr, q_proj, kv_a); adjusted slices to include explicit leading-dimension [1, ...] then reshapes for several q/k/v and wv_tile usages; added top-k buffer scaffolding comments and sparse-attention merge annotations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Buffers hop outside the loop today,
Shapes straighten out and find their way,
Comments hum like carrot tunes,
Reshapes dancing under moons,
A tidy burrow, neat and bright. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'use reshape in deepseek' directly describes the main technical change: introducing reshape operations in the deepseek_v3_2_decode_front.py file, which aligns with the summary of leading-dimension adjustments and reshape operations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly introduces pl.reshape after pl.slice to handle 2D slices from 3D weight tensors, which is a good fix. The changes to move tensor allocations out of auto_incore scopes also seem correct for tensors that need to persist.

My main feedback is to clean up several leftover comments, some of which are questions or temporary notes (e.g., marked with "issue"). These should be removed before merging to improve code clarity and maintainability.

Comment on lines +179 to +181
# inv_rms = 1 / rms


Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Please remove the commented-out code and the extra blank lines. They add clutter and are not needed for the final version of the code.

kv_row = pl.cast(pl.slice(kv_a, [1, KV_LORA_RANK_CFG], [b, 0]), target_type=pl.FP32)
kv_gamma = pl.slice(kv_norm_weight, [1, KV_LORA_RANK_CFG], [0, 0])
kv_normed = pl.col_expand_mul(kv_row, kv_gamma)
# issue 不用做rmsnorm么 ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment seems to be a question or a note to self, marked with "issue". Such comments should be resolved and removed from the code before merging to avoid confusion and keep the code clean.

kv_cache = pl.assemble(kv_cache, pl.cast(kv_normed, target_type=pl.BF16), [cache_row, 0])
pe_cache = pl.assemble(pe_cache, pl.cast(pe_rot, target_type=pl.BF16), [cache_row, 0])
# 对kv_a 分出来的kv_nope做完norm后写入kv_cache, kv_rope做完rope写入pe_cache

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Please remove this extra blank line to improve code readability.

Comment on lines +318 to +319
# q0_nope_latent = q_nope0 @ w_q_nope_to_latent
# issue 没有体现出N轴 这种设计假设不同 head 的 attention 分布相似,用一个 head 做索引选择足够,是否成立?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These comments should be removed. The first one is redundant as it just describes the matmul operation on the line above. The second one is a design question marked with "issue" that should be resolved and not remain in the code.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
examples/deepseek_v3_2_decode_front.py (2)

236-239: Remove commented-out code.

If the allocation has been intentionally moved outside the auto_incore block, the commented-out line should be removed rather than left as dead code.

Suggested fix
             attn_front = pl.create_tensor([BATCH_CFG, ATTN_OUT_CFG], dtype=pl.FP32)
             with pl.auto_incore():
                 layer_id = pl.tensor.read(layer_id_t, [0])
-                # attn_front = pl.create_tensor([BATCH_CFG, ATTN_OUT_CFG], dtype=pl.FP32)
                 for b in pl.parallel(0, BATCH_CFG, 1, chunk=4):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/deepseek_v3_2_decode_front.py` around lines 236 - 239, Remove the
dead commented-out allocation inside the auto_incore block: delete the commented
line "# attn_front = pl.create_tensor([BATCH_CFG, ATTN_OUT_CFG], dtype=pl.FP32)"
so only the active allocation (attn_front = pl.create_tensor(...)) remains
outside the pl.auto_incore() scope; keep the existing pl.auto_incore() and
layer_id = pl.tensor.read(layer_id_t, [0]) lines unchanged.

179-181: Minor formatting nit: extra blank lines.

There are two consecutive blank lines after the comment. Consider removing one for consistency with the rest of the file.

Suggested fix
                 inv_rms = pl.rsqrt(pl.add(pl.mul(sq_sum, HIDDEN_INV), EPS))
                 # inv_rms = 1 / rms
-                
-                
+
                 for b0 in pl.range(0, BATCH_CFG, BATCH_TILE):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/deepseek_v3_2_decode_front.py` around lines 179 - 181, Remove the
extra blank line following the commented line "# inv_rms = 1 / rms" so there is
only a single blank line (consistent with surrounding formatting); locate the
commented expression in the decode routine (the block containing "# inv_rms = 1
/ rms") and delete one of the two consecutive empty lines directly below it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/deepseek_v3_2_decode_front.py`:
- Around line 318-319: Remove or clarify the unresolved design question in the
comment next to q0_nope_latent = q_nope0 @ w_q_nope_to_latent: if the
single-head-for-top-k assumption is validated, replace the "issue" line with a
clear documentation comment stating the design rationale and expected invariants
for q_nope0 and w_q_nope_to_latent (use ASCII punctuation); if it is still open,
convert the text into a TODO or link to a tracked issue rather than merging
uncertain text.
- Line 252: The inline comment "issue 不用做rmsnorm么?" contains a fullwidth
question mark and an unresolved implementation question; replace it with either
a resolved note or implement/confirm RMSNorm usage: if RMSNorm is required,
update the surrounding decoding/normalization code to apply RMSNorm and change
the comment to a definitive statement (e.g., "Applied RMSNorm here"), otherwise
remove the "issue" prefix and the question mark and reword to a clear
explanation (e.g., "RMSNorm not required because <reason>"). Locate the specific
comment text "issue 不用做rmsnorm么?" in the decoding/normalization block to make
the change and ensure all comments use ASCII punctuation.

---

Nitpick comments:
In `@examples/deepseek_v3_2_decode_front.py`:
- Around line 236-239: Remove the dead commented-out allocation inside the
auto_incore block: delete the commented line "# attn_front =
pl.create_tensor([BATCH_CFG, ATTN_OUT_CFG], dtype=pl.FP32)" so only the active
allocation (attn_front = pl.create_tensor(...)) remains outside the
pl.auto_incore() scope; keep the existing pl.auto_incore() and layer_id =
pl.tensor.read(layer_id_t, [0]) lines unchanged.
- Around line 179-181: Remove the extra blank line following the commented line
"# inv_rms = 1 / rms" so there is only a single blank line (consistent with
surrounding formatting); locate the commented expression in the decode routine
(the block containing "# inv_rms = 1 / rms") and delete one of the two
consecutive empty lines directly below it.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e1ab6ca3-0689-4d0f-a362-b1255e5f4c58

📥 Commits

Reviewing files that changed from the base of the PR and between 3edb010 and 93cb34a.

📒 Files selected for processing (1)
  • examples/deepseek_v3_2_decode_front.py

Comment on lines +318 to +319
# q0_nope_latent = q_nope0 @ w_q_nope_to_latent
# issue 没有体现出N轴 这种设计假设不同 head 的 attention 分布相似,用一个 head 做索引选择足够,是否成立?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unresolved design question should be documented or removed.

This comment raises an architectural concern about whether using a single head for top-k indexing (instead of reflecting the N-head axis) is valid. This is a significant design decision.

If this assumption has been validated, consider:

  1. Removing the "issue" prefix and converting to a documentation comment explaining the design rationale
  2. Using ASCII punctuation (, instead of , ? instead of ) per Ruff RUF003

If this is still an open question, it should be tracked as an issue or TODO rather than merged with uncertainty.

🧰 Tools
🪛 Ruff (0.15.6)

[warning] 319-319: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


[warning] 319-319: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


[warning] 319-319: Comment contains ambiguous (FULLWIDTH QUESTION MARK). Did you mean ? (QUESTION MARK)?

(RUF003)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/deepseek_v3_2_decode_front.py` around lines 318 - 319, Remove or
clarify the unresolved design question in the comment next to q0_nope_latent =
q_nope0 @ w_q_nope_to_latent: if the single-head-for-top-k assumption is
validated, replace the "issue" line with a clear documentation comment stating
the design rationale and expected invariants for q_nope0 and w_q_nope_to_latent
(use ASCII punctuation); if it is still open, convert the text into a TODO or
link to a tracked issue rather than merging uncertain text.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
examples/deepseek_v3_2_decode_front.py (2)

318-319: ⚠️ Potential issue | 🟡 Minor

Resolve the open design assumption at Line 318-319 before merge.

The “single-head for index selection” note is still phrased as an unresolved question in a critical attention-indexing path. Please replace with a validated invariant (or track explicitly as TODO/issue).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/deepseek_v3_2_decode_front.py` around lines 318 - 319, Replace the
unresolved question/comment about using a “single head for index selection” in
deepseek_v3_2_decode_front.py with a clear invariant or tracked TODO: either
state the validated assumption (e.g., “All attention heads are expected to have
similar distributions for index selection; we therefore use head 0 for
choose_index()”) and add an assertion or runtime check in the index-selection
path to verify N-axis/head consistency, or replace the comment with a TODO
referencing a ticket/issue number and include a short plan to validate (and
where to add the check). Update the comment adjacent to the index-selection
logic (the single-head/index selection note) and add a short assert/check in
that function to reflect the chosen invariant.

252-252: ⚠️ Potential issue | 🟡 Minor

Resolve the open RMSNorm question at Line 252 before merge.

# issue: don't need to do rmsnorm? leaves core normalization behavior ambiguous in the KV path. Either convert this to a definitive rationale or track it as a TODO/issue with a reference.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/deepseek_v3_2_decode_front.py` at line 252, Replace the ambiguous
inline comment "# issue: don't need to do rmsnorm?" in the KV path of
examples/deepseek_v3_2_decode_front.py with a definitive explanation or a
tracked TODO: either remove the question and state why RMSNorm is intentionally
omitted (e.g., "RMSNorm omitted because KV tensors are already normalized
upstream/handled in attention layers") or replace it with a TODO referencing an
issue ID (e.g., "TODO(`#1234`): verify whether RMSNorm required for KV path —
tracked in issue `#1234`") so the normalization behavior is unambiguous; update
the comment where the KV/attention key-value tensors are prepared (the KV path
code block) and ensure it references the exact rationale or issue number.
🧹 Nitpick comments (1)
examples/deepseek_v3_2_decode_front.py (1)

236-239: Remove stale commented-out allocation at Line 239.

Keeping the old attn_front creation as a comment can drift from actual behavior and confuse future edits.

Suggested cleanup
-                # attn_front = pl.create_tensor([BATCH_CFG, ATTN_OUT_CFG], dtype=pl.FP32)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/deepseek_v3_2_decode_front.py` around lines 236 - 239, The file
contains a stale commented-out allocation of attn_front (the line with "#
attn_front = pl.create_tensor([BATCH_CFG, ATTN_OUT_CFG], dtype=pl.FP32)") that
should be removed to avoid confusion; delete that commented line so only the
active allocation (attn_front = pl.create_tensor([BATCH_CFG, ATTN_OUT_CFG],
dtype=pl.FP32)) and its surrounding with pl.auto_incore() block remain intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@examples/deepseek_v3_2_decode_front.py`:
- Around line 318-319: Replace the unresolved question/comment about using a
“single head for index selection” in deepseek_v3_2_decode_front.py with a clear
invariant or tracked TODO: either state the validated assumption (e.g., “All
attention heads are expected to have similar distributions for index selection;
we therefore use head 0 for choose_index()”) and add an assertion or runtime
check in the index-selection path to verify N-axis/head consistency, or replace
the comment with a TODO referencing a ticket/issue number and include a short
plan to validate (and where to add the check). Update the comment adjacent to
the index-selection logic (the single-head/index selection note) and add a short
assert/check in that function to reflect the chosen invariant.
- Line 252: Replace the ambiguous inline comment "# issue: don't need to do
rmsnorm?" in the KV path of examples/deepseek_v3_2_decode_front.py with a
definitive explanation or a tracked TODO: either remove the question and state
why RMSNorm is intentionally omitted (e.g., "RMSNorm omitted because KV tensors
are already normalized upstream/handled in attention layers") or replace it with
a TODO referencing an issue ID (e.g., "TODO(`#1234`): verify whether RMSNorm
required for KV path — tracked in issue `#1234`") so the normalization behavior is
unambiguous; update the comment where the KV/attention key-value tensors are
prepared (the KV path code block) and ensure it references the exact rationale
or issue number.

---

Nitpick comments:
In `@examples/deepseek_v3_2_decode_front.py`:
- Around line 236-239: The file contains a stale commented-out allocation of
attn_front (the line with "# attn_front = pl.create_tensor([BATCH_CFG,
ATTN_OUT_CFG], dtype=pl.FP32)") that should be removed to avoid confusion;
delete that commented line so only the active allocation (attn_front =
pl.create_tensor([BATCH_CFG, ATTN_OUT_CFG], dtype=pl.FP32)) and its surrounding
with pl.auto_incore() block remain intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d86fae12-aac0-4c5e-ad5a-8b31ce115a8b

📥 Commits

Reviewing files that changed from the base of the PR and between 93cb34a and a0a2d84.

📒 Files selected for processing (1)
  • examples/deepseek_v3_2_decode_front.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants