Add matmul example and organize examples into subdirectories by zhangqi-chen · Pull Request #20 · hw-native-sys/pypto-lib

zhangqi-chen · 2026-03-18T00:46:22Z

Summary

Add tiled matmul example (examples/matmul.py) with M/N blocking, FP16 inputs, FP32 output
Organize examples into model-specific subdirectories (deepseek_v3_2/, qwen3/)
Add matmul to CI pipeline for both sim and a2a3 on-device testing

Testing

python examples/matmul.py --sim passes
python examples/matmul.py --device=<id> passes on a2a3
Existing examples still run from new paths
Code follows pypto frontend coding style

Summary by CodeRabbit

New Features
- Added a matrix-multiplication example script that supports configurable dimensions, tiled execution, FP16 inputs with FP32 outputs, and can run in simulation or on hardware. The script prints diagnostic results and exits with a nonzero status on failure.
Tests
- CI now runs matrix-multiplication tests in both simulation and hardware workflows, mirroring existing example test coverage.

gemini-code-assist · 2026-03-18T00:46:42Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the project's example suite by introducing a new, performance-oriented matrix multiplication example and improving the overall structure of the examples directory. The changes aim to provide clearer demonstrations of core functionalities and better organize existing model-specific implementations, facilitating easier navigation and future expansion.

Highlights

New Matmul Example: A tiled matrix multiplication example (examples/matmul.py) has been added, featuring M/N blocking, FP16 inputs, and FP32 output for mixed-precision computation.
Example Organization: Existing examples have been restructured into model-specific subdirectories, such as deepseek_v3_2/ and qwen3/, to improve clarity and maintainability.
CI Integration: The new matmul example has been integrated into the CI pipeline for both simulation and on-device testing on a2a3 platforms.

Ignored Files

Ignored by pattern: .github/workflows/** (1)
- .github/workflows/ci.yml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-03-18T00:47:01Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a new matmul example script implementing tiled matrix multiplication (FP16 inputs, FP32 outputs) and extends the CI workflow with two new test steps to run the matmul example in both sim and a2a3 modes.

Changes

Cohort / File(s)	Summary
CI Workflow `.github/workflows/ci.yml`	Adds two test steps: `Run matmul sim test` (runs `python examples/matmul.py --sim`) and `Run matmul a2a3 test` (runs `python examples/matmul.py --device=$DEVICE_ID`), placed after existing hello_world tests.
New Matmul Example `examples/matmul.py`	Adds a complete matmul example: `build_matmul_program()` (tiled MatmulProgram), `build_tensor_specs()` (A/B/C tensor specs), `golden_matmul()` (FP32 reference), `compile_and_run()` (build, run, validate, diagnostics), and a CLI entrypoint supporting `--sim` and `--device` flags.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Add hello_world example and switch CI to sim #13: Adds analogous example scripts and CI steps with matching build_tensor_specs / compile_and_run / golden_* patterns; likely code-level similarity.

Poem

🐰 I hopped through tiles of numbers bright,
FP16 leaves, FP32 light,
CI hums, sim and board aligned,
A tiny matmul—neatly timed! 🎶

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately captures the main changes: adding a matmul example and organizing examples into subdirectories, which aligns with the raw_summary and pr_objectives.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

- Add tiled matmul example with M/N blocking (FP16 input, FP32 output) - Move DeepSeek V3.2 examples into examples/deepseek_v3_2/ - Move Qwen3 examples into examples/qwen3/ - Add matmul to CI pipeline (sim + a2a3 device tests)

gemini-code-assist

Code Review

This pull request introduces a new matmul.py example and reorganizes existing examples into subdirectories for better structure. The new matmul example is well-written and demonstrates a tiled matrix multiplication. My review includes a suggestion to enhance the example's flexibility by adding command-line arguments for matrix dimensions and tiling parameters, which would improve its usability for experimentation.

examples/matmul.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

examples/matmul.py (1)

84-95: Add explicit tiling/chunk precondition checks

compile_and_run accepts arbitrary m/n/tile/chunk values, but the kernel assumes full tiles. A fail-fast validation will prevent confusing runtime errors when non-divisible shapes are passed.

Proposed refactor

 def compile_and_run(
@@
 ):
+    if m_tile <= 0 or n_tile <= 0 or m_chunk <= 0 or n_chunk <= 0:
+        raise ValueError("m_tile, n_tile, m_chunk, and n_chunk must be positive")
+    if m % m_tile != 0 or n % n_tile != 0:
+        raise ValueError("m and n must be divisible by m_tile and n_tile")
+
     from pypto.backend import BackendType

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/matmul.py` around lines 84 - 95, Add fail-fast precondition checks
at the start of compile_and_run: validate that m_tile, n_tile, m_chunk, n_chunk
are positive integers and that m and n are divisible by the tile sizes (m %
m_tile == 0 and n % n_tile == 0) and that the resulting tile counts are
divisible by the chunk sizes ((m // m_tile) % m_chunk == 0 and (n // n_tile) %
n_chunk == 0). If any check fails, raise a ValueError with a clear message
referencing the offending variables (m, n, m_tile, n_tile, m_chunk, n_chunk) so
callers get an immediate, informative error instead of obscure runtime/kernel
failures.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/matmul.py`:
- Around line 121-126: The "COMPILE OK — device run skipped (code_runner not
found)" branch still leaves result.passed false so the CLI exits non‑zero; in
the branch that checks if not result.passed and "code_runner" in result.error,
mark the run as successful by setting result.passed = True (and optionally clear
or annotate result.error) before printing and returning so the fallback path
yields a zero exit; locate the conditional that tests result.error and
"code_runner" to apply this change.

---

Nitpick comments:
In `@examples/matmul.py`:
- Around line 84-95: Add fail-fast precondition checks at the start of
compile_and_run: validate that m_tile, n_tile, m_chunk, n_chunk are positive
integers and that m and n are divisible by the tile sizes (m % m_tile == 0 and n
% n_tile == 0) and that the resulting tile counts are divisible by the chunk
sizes ((m // m_tile) % m_chunk == 0 and (n // n_tile) % n_chunk == 0). If any
check fails, raise a ValueError with a clear message referencing the offending
variables (m, n, m_tile, n_tile, m_chunk, n_chunk) so callers get an immediate,
informative error instead of obscure runtime/kernel failures.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c15645e8-2d6e-4f41-8ba0-28bc39c588ff

📥 Commits

Reviewing files that changed from the base of the PR and between 9bdec9e and 80b44e0.

📒 Files selected for processing (10)

.github/workflows/ci.yml
examples/deepseek_v3_2/deepseek_v3_2_decode_back.py
examples/deepseek_v3_2/deepseek_v3_2_decode_front.py
examples/deepseek_v3_2/deepseek_v3_2_prefill_back.py
examples/deepseek_v3_2/deepseek_v3_2_prefill_front.py
examples/matmul.py
examples/qwen3/qwen3-32b.py
examples/qwen3/qwen3_32b_decode.py
examples/qwen3/qwen3_32b_prefill.py
examples/qwen3/qwen3_32b_training_forward_and_backward.py

examples/matmul.py

coderabbitai

♻️ Duplicate comments (1)

examples/matmul.py (1)

121-126: ⚠️ Potential issue | 🟠 Major

Compile-only fallback still exits non-zero.

At Line 122 the script reports compile success when code_runner is missing, but result.passed stays False, so Line 141 still exits with code 1. This makes the fallback path unusable from CLI/CI.

Proposed fix

-    if not result.passed and result.error and "code_runner" in result.error:
+    if not result.passed and result.error and "code_runner" in result.error:
+        result.passed = True
         print("Result: COMPILE OK — device run skipped (code_runner not found).\n")
         print(result.error)

Also applies to: 141-142

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/matmul.py` around lines 121 - 126, The compile-only fallback prints
a success message when "code_runner" is missing but leaves result.passed False
so callers still treat it as failure; update the fallback handling in the block
checking result.error for "code_runner" to mark the run as successful (e.g., set
result.passed = True) or otherwise ensure the final exit logic treats this case
as success (adjust the code path that uses result.passed/return result and the
later process exit logic) so that the CLI/CI returns zero when the runner is
intentionally skipped.

🧹 Nitpick comments (1)

examples/matmul.py (1)

34-42: Add explicit tile-shape guards for non-default arguments.

pl.slice uses full tile sizes; if callers pass m/n not divisible by m_tile/n_tile, behavior can become invalid or partial. Add early validation for clearer failures.

Proposed refactor

 def build_matmul_program(
     m: int = M,
     n: int = N,
     k: int = K,
     m_tile: int = M_TILE,
     n_tile: int = N_TILE,
     m_chunk: int = M_CHUNK,
     n_chunk: int = N_CHUNK,
 ):
+    if min(m, n, k, m_tile, n_tile, m_chunk, n_chunk) <= 0:
+        raise ValueError("m, n, k, m_tile, n_tile, m_chunk, and n_chunk must be positive")
+    if m % m_tile != 0 or n % n_tile != 0:
+        raise ValueError("m and n must be divisible by m_tile and n_tile")
+
     `@pl.program`
     class MatmulProgram:

Also applies to: 53-57

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/matmul.py` around lines 34 - 42, In build_matmul_program (and the
analogous function around lines 53-57) add explicit validation that when callers
pass non-default m_tile/n_tile the matrix dimensions m and n are divisible by
the provided m_tile and n_tile respectively; if not, raise a clear ValueError
with a descriptive message. Perform these guards at the top of
build_matmul_program (before any pl.slice/tiling logic) and check m % m_tile ==
0 and n % n_tile == 0 so pl.slice is never given partial tiles; reference the
function name build_matmul_program and the corresponding helper function/block
at lines ~53-57 when adding the checks.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@examples/matmul.py`:
- Around line 121-126: The compile-only fallback prints a success message when
"code_runner" is missing but leaves result.passed False so callers still treat
it as failure; update the fallback handling in the block checking result.error
for "code_runner" to mark the run as successful (e.g., set result.passed = True)
or otherwise ensure the final exit logic treats this case as success (adjust the
code path that uses result.passed/return result and the later process exit
logic) so that the CLI/CI returns zero when the runner is intentionally skipped.

---

Nitpick comments:
In `@examples/matmul.py`:
- Around line 34-42: In build_matmul_program (and the analogous function around
lines 53-57) add explicit validation that when callers pass non-default
m_tile/n_tile the matrix dimensions m and n are divisible by the provided m_tile
and n_tile respectively; if not, raise a clear ValueError with a descriptive
message. Perform these guards at the top of build_matmul_program (before any
pl.slice/tiling logic) and check m % m_tile == 0 and n % n_tile == 0 so pl.slice
is never given partial tiles; reference the function name build_matmul_program
and the corresponding helper function/block at lines ~53-57 when adding the
checks.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a6fd7528-537b-4001-9c17-59b73eba784b

📥 Commits

Reviewing files that changed from the base of the PR and between 80b44e0 and 70c0653.

📒 Files selected for processing (10)

.github/workflows/ci.yml
examples/deepseek_v3_2/deepseek_v3_2_decode_back.py
examples/deepseek_v3_2/deepseek_v3_2_decode_front.py
examples/deepseek_v3_2/deepseek_v3_2_prefill_back.py
examples/deepseek_v3_2/deepseek_v3_2_prefill_front.py
examples/matmul.py
examples/qwen3/qwen3-32b.py
examples/qwen3/qwen3_32b_decode.py
examples/qwen3/qwen3_32b_prefill.py
examples/qwen3/qwen3_32b_training_forward_and_backward.py

zhangqi-chen force-pushed the ci branch from 80b44e0 to 70c0653 Compare March 18, 2026 00:48

gemini-code-assist bot reviewed Mar 18, 2026

View reviewed changes

examples/matmul.py Show resolved Hide resolved

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

examples/matmul.py Show resolved Hide resolved

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

zhangqi-chen merged commit e5fdf4a into hw-native-sys:main Mar 18, 2026
4 checks passed

zhangqi-chen deleted the ci branch March 18, 2026 01:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add matmul example and organize examples into subdirectories#20

Add matmul example and organize examples into subdirectories#20
zhangqi-chen merged 1 commit intohw-native-sys:mainfrom
zhangqi-chen:ci

zhangqi-chen commented Mar 18, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

gemini-code-assist bot commented Mar 18, 2026

Uh oh!

coderabbitai bot commented Mar 18, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zhangqi-chen commented Mar 18, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Mar 18, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zhangqi-chen commented Mar 18, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 18, 2026 •

edited

Loading