Add matmul example and organize examples into subdirectories#20
Add matmul example and organize examples into subdirectories#20zhangqi-chen merged 1 commit intohw-native-sys:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the project's example suite by introducing a new, performance-oriented matrix multiplication example and improving the overall structure of the examples directory. The changes aim to provide clearer demonstrations of core functionalities and better organize existing model-specific implementations, facilitating easier navigation and future expansion. Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new matmul example script implementing tiled matrix multiplication (FP16 inputs, FP32 outputs) and extends the CI workflow with two new test steps to run the matmul example in both sim and a2a3 modes. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Add tiled matmul example with M/N blocking (FP16 input, FP32 output) - Move DeepSeek V3.2 examples into examples/deepseek_v3_2/ - Move Qwen3 examples into examples/qwen3/ - Add matmul to CI pipeline (sim + a2a3 device tests)
There was a problem hiding this comment.
Code Review
This pull request introduces a new matmul.py example and reorganizes existing examples into subdirectories for better structure. The new matmul example is well-written and demonstrates a tiled matrix multiplication. My review includes a suggestion to enhance the example's flexibility by adding command-line arguments for matrix dimensions and tiling parameters, which would improve its usability for experimentation.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
examples/matmul.py (1)
84-95: Add explicit tiling/chunk precondition checks
compile_and_runaccepts arbitrarym/n/tile/chunkvalues, but the kernel assumes full tiles. A fail-fast validation will prevent confusing runtime errors when non-divisible shapes are passed.Proposed refactor
def compile_and_run( @@ ): + if m_tile <= 0 or n_tile <= 0 or m_chunk <= 0 or n_chunk <= 0: + raise ValueError("m_tile, n_tile, m_chunk, and n_chunk must be positive") + if m % m_tile != 0 or n % n_tile != 0: + raise ValueError("m and n must be divisible by m_tile and n_tile") + from pypto.backend import BackendType🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/matmul.py` around lines 84 - 95, Add fail-fast precondition checks at the start of compile_and_run: validate that m_tile, n_tile, m_chunk, n_chunk are positive integers and that m and n are divisible by the tile sizes (m % m_tile == 0 and n % n_tile == 0) and that the resulting tile counts are divisible by the chunk sizes ((m // m_tile) % m_chunk == 0 and (n // n_tile) % n_chunk == 0). If any check fails, raise a ValueError with a clear message referencing the offending variables (m, n, m_tile, n_tile, m_chunk, n_chunk) so callers get an immediate, informative error instead of obscure runtime/kernel failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/matmul.py`:
- Around line 121-126: The "COMPILE OK — device run skipped (code_runner not
found)" branch still leaves result.passed false so the CLI exits non‑zero; in
the branch that checks if not result.passed and "code_runner" in result.error,
mark the run as successful by setting result.passed = True (and optionally clear
or annotate result.error) before printing and returning so the fallback path
yields a zero exit; locate the conditional that tests result.error and
"code_runner" to apply this change.
---
Nitpick comments:
In `@examples/matmul.py`:
- Around line 84-95: Add fail-fast precondition checks at the start of
compile_and_run: validate that m_tile, n_tile, m_chunk, n_chunk are positive
integers and that m and n are divisible by the tile sizes (m % m_tile == 0 and n
% n_tile == 0) and that the resulting tile counts are divisible by the chunk
sizes ((m // m_tile) % m_chunk == 0 and (n // n_tile) % n_chunk == 0). If any
check fails, raise a ValueError with a clear message referencing the offending
variables (m, n, m_tile, n_tile, m_chunk, n_chunk) so callers get an immediate,
informative error instead of obscure runtime/kernel failures.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c15645e8-2d6e-4f41-8ba0-28bc39c588ff
📒 Files selected for processing (10)
.github/workflows/ci.ymlexamples/deepseek_v3_2/deepseek_v3_2_decode_back.pyexamples/deepseek_v3_2/deepseek_v3_2_decode_front.pyexamples/deepseek_v3_2/deepseek_v3_2_prefill_back.pyexamples/deepseek_v3_2/deepseek_v3_2_prefill_front.pyexamples/matmul.pyexamples/qwen3/qwen3-32b.pyexamples/qwen3/qwen3_32b_decode.pyexamples/qwen3/qwen3_32b_prefill.pyexamples/qwen3/qwen3_32b_training_forward_and_backward.py
There was a problem hiding this comment.
♻️ Duplicate comments (1)
examples/matmul.py (1)
121-126:⚠️ Potential issue | 🟠 MajorCompile-only fallback still exits non-zero.
At Line 122 the script reports compile success when
code_runneris missing, butresult.passedstaysFalse, so Line 141 still exits with code 1. This makes the fallback path unusable from CLI/CI.Proposed fix
- if not result.passed and result.error and "code_runner" in result.error: + if not result.passed and result.error and "code_runner" in result.error: + result.passed = True print("Result: COMPILE OK — device run skipped (code_runner not found).\n") print(result.error)Also applies to: 141-142
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/matmul.py` around lines 121 - 126, The compile-only fallback prints a success message when "code_runner" is missing but leaves result.passed False so callers still treat it as failure; update the fallback handling in the block checking result.error for "code_runner" to mark the run as successful (e.g., set result.passed = True) or otherwise ensure the final exit logic treats this case as success (adjust the code path that uses result.passed/return result and the later process exit logic) so that the CLI/CI returns zero when the runner is intentionally skipped.
🧹 Nitpick comments (1)
examples/matmul.py (1)
34-42: Add explicit tile-shape guards for non-default arguments.
pl.sliceuses full tile sizes; if callers passm/nnot divisible bym_tile/n_tile, behavior can become invalid or partial. Add early validation for clearer failures.Proposed refactor
def build_matmul_program( m: int = M, n: int = N, k: int = K, m_tile: int = M_TILE, n_tile: int = N_TILE, m_chunk: int = M_CHUNK, n_chunk: int = N_CHUNK, ): + if min(m, n, k, m_tile, n_tile, m_chunk, n_chunk) <= 0: + raise ValueError("m, n, k, m_tile, n_tile, m_chunk, and n_chunk must be positive") + if m % m_tile != 0 or n % n_tile != 0: + raise ValueError("m and n must be divisible by m_tile and n_tile") + `@pl.program` class MatmulProgram:Also applies to: 53-57
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/matmul.py` around lines 34 - 42, In build_matmul_program (and the analogous function around lines 53-57) add explicit validation that when callers pass non-default m_tile/n_tile the matrix dimensions m and n are divisible by the provided m_tile and n_tile respectively; if not, raise a clear ValueError with a descriptive message. Perform these guards at the top of build_matmul_program (before any pl.slice/tiling logic) and check m % m_tile == 0 and n % n_tile == 0 so pl.slice is never given partial tiles; reference the function name build_matmul_program and the corresponding helper function/block at lines ~53-57 when adding the checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@examples/matmul.py`:
- Around line 121-126: The compile-only fallback prints a success message when
"code_runner" is missing but leaves result.passed False so callers still treat
it as failure; update the fallback handling in the block checking result.error
for "code_runner" to mark the run as successful (e.g., set result.passed = True)
or otherwise ensure the final exit logic treats this case as success (adjust the
code path that uses result.passed/return result and the later process exit
logic) so that the CLI/CI returns zero when the runner is intentionally skipped.
---
Nitpick comments:
In `@examples/matmul.py`:
- Around line 34-42: In build_matmul_program (and the analogous function around
lines 53-57) add explicit validation that when callers pass non-default
m_tile/n_tile the matrix dimensions m and n are divisible by the provided m_tile
and n_tile respectively; if not, raise a clear ValueError with a descriptive
message. Perform these guards at the top of build_matmul_program (before any
pl.slice/tiling logic) and check m % m_tile == 0 and n % n_tile == 0 so pl.slice
is never given partial tiles; reference the function name build_matmul_program
and the corresponding helper function/block at lines ~53-57 when adding the
checks.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a6fd7528-537b-4001-9c17-59b73eba784b
📒 Files selected for processing (10)
.github/workflows/ci.ymlexamples/deepseek_v3_2/deepseek_v3_2_decode_back.pyexamples/deepseek_v3_2/deepseek_v3_2_decode_front.pyexamples/deepseek_v3_2/deepseek_v3_2_prefill_back.pyexamples/deepseek_v3_2/deepseek_v3_2_prefill_front.pyexamples/matmul.pyexamples/qwen3/qwen3-32b.pyexamples/qwen3/qwen3_32b_decode.pyexamples/qwen3/qwen3_32b_prefill.pyexamples/qwen3/qwen3_32b_training_forward_and_backward.py
Summary
examples/matmul.py) with M/N blocking, FP16 inputs, FP32 outputdeepseek_v3_2/,qwen3/)Testing
python examples/matmul.py --simpassespython examples/matmul.py --device=<id>passes on a2a3Summary by CodeRabbit
New Features
Tests