Skip to content

Add matmul example and organize examples into subdirectories#20

Merged
zhangqi-chen merged 1 commit intohw-native-sys:mainfrom
zhangqi-chen:ci
Mar 18, 2026
Merged

Add matmul example and organize examples into subdirectories#20
zhangqi-chen merged 1 commit intohw-native-sys:mainfrom
zhangqi-chen:ci

Conversation

@zhangqi-chen
Copy link
Collaborator

@zhangqi-chen zhangqi-chen commented Mar 18, 2026

Summary

  • Add tiled matmul example (examples/matmul.py) with M/N blocking, FP16 inputs, FP32 output
  • Organize examples into model-specific subdirectories (deepseek_v3_2/, qwen3/)
  • Add matmul to CI pipeline for both sim and a2a3 on-device testing

Testing

  • python examples/matmul.py --sim passes
  • python examples/matmul.py --device=<id> passes on a2a3
  • Existing examples still run from new paths
  • Code follows pypto frontend coding style

Summary by CodeRabbit

  • New Features

    • Added a matrix-multiplication example script that supports configurable dimensions, tiled execution, FP16 inputs with FP32 outputs, and can run in simulation or on hardware. The script prints diagnostic results and exits with a nonzero status on failure.
  • Tests

    • CI now runs matrix-multiplication tests in both simulation and hardware workflows, mirroring existing example test coverage.

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the project's example suite by introducing a new, performance-oriented matrix multiplication example and improving the overall structure of the examples directory. The changes aim to provide clearer demonstrations of core functionalities and better organize existing model-specific implementations, facilitating easier navigation and future expansion.

Highlights

  • New Matmul Example: A tiled matrix multiplication example (examples/matmul.py) has been added, featuring M/N blocking, FP16 inputs, and FP32 output for mixed-precision computation.
  • Example Organization: Existing examples have been restructured into model-specific subdirectories, such as deepseek_v3_2/ and qwen3/, to improve clarity and maintainability.
  • CI Integration: The new matmul example has been integrated into the CI pipeline for both simulation and on-device testing on a2a3 platforms.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/ci.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Mar 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new matmul example script implementing tiled matrix multiplication (FP16 inputs, FP32 outputs) and extends the CI workflow with two new test steps to run the matmul example in both sim and a2a3 modes.

Changes

Cohort / File(s) Summary
CI Workflow
.github/workflows/ci.yml
Adds two test steps: Run matmul sim test (runs python examples/matmul.py --sim) and Run matmul a2a3 test (runs python examples/matmul.py --device=$DEVICE_ID), placed after existing hello_world tests.
New Matmul Example
examples/matmul.py
Adds a complete matmul example: build_matmul_program() (tiled MatmulProgram), build_tensor_specs() (A/B/C tensor specs), golden_matmul() (FP32 reference), compile_and_run() (build, run, validate, diagnostics), and a CLI entrypoint supporting --sim and --device flags.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 I hopped through tiles of numbers bright,
FP16 leaves, FP32 light,
CI hums, sim and board aligned,
A tiny matmul—neatly timed! 🎶

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main changes: adding a matmul example and organizing examples into subdirectories, which aligns with the raw_summary and pr_objectives.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

- Add tiled matmul example with M/N blocking (FP16 input, FP32 output)
- Move DeepSeek V3.2 examples into examples/deepseek_v3_2/
- Move Qwen3 examples into examples/qwen3/
- Add matmul to CI pipeline (sim + a2a3 device tests)
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new matmul.py example and reorganizes existing examples into subdirectories for better structure. The new matmul example is well-written and demonstrates a tiled matrix multiplication. My review includes a suggestion to enhance the example's flexibility by adding command-line arguments for matrix dimensions and tiling parameters, which would improve its usability for experimentation.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
examples/matmul.py (1)

84-95: Add explicit tiling/chunk precondition checks

compile_and_run accepts arbitrary m/n/tile/chunk values, but the kernel assumes full tiles. A fail-fast validation will prevent confusing runtime errors when non-divisible shapes are passed.

Proposed refactor
 def compile_and_run(
@@
 ):
+    if m_tile <= 0 or n_tile <= 0 or m_chunk <= 0 or n_chunk <= 0:
+        raise ValueError("m_tile, n_tile, m_chunk, and n_chunk must be positive")
+    if m % m_tile != 0 or n % n_tile != 0:
+        raise ValueError("m and n must be divisible by m_tile and n_tile")
+
     from pypto.backend import BackendType
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/matmul.py` around lines 84 - 95, Add fail-fast precondition checks
at the start of compile_and_run: validate that m_tile, n_tile, m_chunk, n_chunk
are positive integers and that m and n are divisible by the tile sizes (m %
m_tile == 0 and n % n_tile == 0) and that the resulting tile counts are
divisible by the chunk sizes ((m // m_tile) % m_chunk == 0 and (n // n_tile) %
n_chunk == 0). If any check fails, raise a ValueError with a clear message
referencing the offending variables (m, n, m_tile, n_tile, m_chunk, n_chunk) so
callers get an immediate, informative error instead of obscure runtime/kernel
failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/matmul.py`:
- Around line 121-126: The "COMPILE OK — device run skipped (code_runner not
found)" branch still leaves result.passed false so the CLI exits non‑zero; in
the branch that checks if not result.passed and "code_runner" in result.error,
mark the run as successful by setting result.passed = True (and optionally clear
or annotate result.error) before printing and returning so the fallback path
yields a zero exit; locate the conditional that tests result.error and
"code_runner" to apply this change.

---

Nitpick comments:
In `@examples/matmul.py`:
- Around line 84-95: Add fail-fast precondition checks at the start of
compile_and_run: validate that m_tile, n_tile, m_chunk, n_chunk are positive
integers and that m and n are divisible by the tile sizes (m % m_tile == 0 and n
% n_tile == 0) and that the resulting tile counts are divisible by the chunk
sizes ((m // m_tile) % m_chunk == 0 and (n // n_tile) % n_chunk == 0). If any
check fails, raise a ValueError with a clear message referencing the offending
variables (m, n, m_tile, n_tile, m_chunk, n_chunk) so callers get an immediate,
informative error instead of obscure runtime/kernel failures.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c15645e8-2d6e-4f41-8ba0-28bc39c588ff

📥 Commits

Reviewing files that changed from the base of the PR and between 9bdec9e and 80b44e0.

📒 Files selected for processing (10)
  • .github/workflows/ci.yml
  • examples/deepseek_v3_2/deepseek_v3_2_decode_back.py
  • examples/deepseek_v3_2/deepseek_v3_2_decode_front.py
  • examples/deepseek_v3_2/deepseek_v3_2_prefill_back.py
  • examples/deepseek_v3_2/deepseek_v3_2_prefill_front.py
  • examples/matmul.py
  • examples/qwen3/qwen3-32b.py
  • examples/qwen3/qwen3_32b_decode.py
  • examples/qwen3/qwen3_32b_prefill.py
  • examples/qwen3/qwen3_32b_training_forward_and_backward.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
examples/matmul.py (1)

121-126: ⚠️ Potential issue | 🟠 Major

Compile-only fallback still exits non-zero.

At Line 122 the script reports compile success when code_runner is missing, but result.passed stays False, so Line 141 still exits with code 1. This makes the fallback path unusable from CLI/CI.

Proposed fix
-    if not result.passed and result.error and "code_runner" in result.error:
+    if not result.passed and result.error and "code_runner" in result.error:
+        result.passed = True
         print("Result: COMPILE OK — device run skipped (code_runner not found).\n")
         print(result.error)

Also applies to: 141-142

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/matmul.py` around lines 121 - 126, The compile-only fallback prints
a success message when "code_runner" is missing but leaves result.passed False
so callers still treat it as failure; update the fallback handling in the block
checking result.error for "code_runner" to mark the run as successful (e.g., set
result.passed = True) or otherwise ensure the final exit logic treats this case
as success (adjust the code path that uses result.passed/return result and the
later process exit logic) so that the CLI/CI returns zero when the runner is
intentionally skipped.
🧹 Nitpick comments (1)
examples/matmul.py (1)

34-42: Add explicit tile-shape guards for non-default arguments.

pl.slice uses full tile sizes; if callers pass m/n not divisible by m_tile/n_tile, behavior can become invalid or partial. Add early validation for clearer failures.

Proposed refactor
 def build_matmul_program(
     m: int = M,
     n: int = N,
     k: int = K,
     m_tile: int = M_TILE,
     n_tile: int = N_TILE,
     m_chunk: int = M_CHUNK,
     n_chunk: int = N_CHUNK,
 ):
+    if min(m, n, k, m_tile, n_tile, m_chunk, n_chunk) <= 0:
+        raise ValueError("m, n, k, m_tile, n_tile, m_chunk, and n_chunk must be positive")
+    if m % m_tile != 0 or n % n_tile != 0:
+        raise ValueError("m and n must be divisible by m_tile and n_tile")
+
     `@pl.program`
     class MatmulProgram:

Also applies to: 53-57

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/matmul.py` around lines 34 - 42, In build_matmul_program (and the
analogous function around lines 53-57) add explicit validation that when callers
pass non-default m_tile/n_tile the matrix dimensions m and n are divisible by
the provided m_tile and n_tile respectively; if not, raise a clear ValueError
with a descriptive message. Perform these guards at the top of
build_matmul_program (before any pl.slice/tiling logic) and check m % m_tile ==
0 and n % n_tile == 0 so pl.slice is never given partial tiles; reference the
function name build_matmul_program and the corresponding helper function/block
at lines ~53-57 when adding the checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@examples/matmul.py`:
- Around line 121-126: The compile-only fallback prints a success message when
"code_runner" is missing but leaves result.passed False so callers still treat
it as failure; update the fallback handling in the block checking result.error
for "code_runner" to mark the run as successful (e.g., set result.passed = True)
or otherwise ensure the final exit logic treats this case as success (adjust the
code path that uses result.passed/return result and the later process exit
logic) so that the CLI/CI returns zero when the runner is intentionally skipped.

---

Nitpick comments:
In `@examples/matmul.py`:
- Around line 34-42: In build_matmul_program (and the analogous function around
lines 53-57) add explicit validation that when callers pass non-default
m_tile/n_tile the matrix dimensions m and n are divisible by the provided m_tile
and n_tile respectively; if not, raise a clear ValueError with a descriptive
message. Perform these guards at the top of build_matmul_program (before any
pl.slice/tiling logic) and check m % m_tile == 0 and n % n_tile == 0 so pl.slice
is never given partial tiles; reference the function name build_matmul_program
and the corresponding helper function/block at lines ~53-57 when adding the
checks.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a6fd7528-537b-4001-9c17-59b73eba784b

📥 Commits

Reviewing files that changed from the base of the PR and between 80b44e0 and 70c0653.

📒 Files selected for processing (10)
  • .github/workflows/ci.yml
  • examples/deepseek_v3_2/deepseek_v3_2_decode_back.py
  • examples/deepseek_v3_2/deepseek_v3_2_decode_front.py
  • examples/deepseek_v3_2/deepseek_v3_2_prefill_back.py
  • examples/deepseek_v3_2/deepseek_v3_2_prefill_front.py
  • examples/matmul.py
  • examples/qwen3/qwen3-32b.py
  • examples/qwen3/qwen3_32b_decode.py
  • examples/qwen3/qwen3_32b_prefill.py
  • examples/qwen3/qwen3_32b_training_forward_and_backward.py

@zhangqi-chen zhangqi-chen merged commit e5fdf4a into hw-native-sys:main Mar 18, 2026
4 checks passed
@zhangqi-chen zhangqi-chen deleted the ci branch March 18, 2026 01:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant