Add softmax and rms_norm examples with CI integration, upgrade PTOAS to v0.9 by zhangqi-chen · Pull Request #24 · hw-native-sys/pypto-lib

zhangqi-chen · 2026-03-19T11:12:14Z

Summary

Add row-wise softmax example with numerical stability (max-shift)
Add two-pass RMSNorm example with column chunking and gamma weight
Add both examples to CI pipeline (sim + a2a3 device tests)

Testing

python examples/softmax.py --sim passes
python examples/rms_norm.py --sim passes
CI sim and a2a3 jobs pass

Summary by CodeRabbit

New Features
- Added softmax and RMSNorm example programs showcasing row-wise softmax and RMS normalization workflows and reference checks.
Chores
- Updated CI to use PTOAS v0.9 and extended automated validation to run the new examples on simulator and target device; added download integrity verification in simulation job.

gemini-code-assist · 2026-03-19T11:12:27Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the examples library by introducing robust implementations of two fundamental neural network operations: softmax and RMSNorm. The new examples showcase advanced techniques like numerical stability for softmax and efficient two-pass column chunking for RMSNorm, which are crucial for large-scale models. Their integration into the CI pipeline ensures ongoing validation and correctness across different execution environments.

Highlights

New Softmax Example: Added a row-wise softmax example that incorporates numerical stability through max-shift.
New RMSNorm Example: Introduced a two-pass RMSNorm example featuring column chunking and gamma weight application, designed for handling large hidden dimensions.
CI Integration: Integrated both the new softmax and RMSNorm examples into the CI pipeline, including both simulation and a2a3 device tests.

Ignored Files

Ignored by pattern: .github/workflows/** (1)
- .github/workflows/ci.yml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-03-19T11:12:28Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds two new example programs (softmax and RMSNorm) with build/compile/run/golden-check logic, and updates CI to install PTOAS v0.9 and run those examples in both sim (x86_64) and a2a3 (aarch64) jobs, including sha256 verification in the sim job.

Changes

Cohort / File(s)	Summary
CI Pipeline `.github/workflows/ci.yml`	Bump PTOAS to `v0.9` (update SHA256s), add `sha256sum -c -` verification in `sim`, and add `examples/softmax.py` and `examples/rms_norm.py` execution steps to both `sim` and `a2a3` jobs alongside existing tests.
Softmax Example `examples/softmax.py`	New example implementing numerically stable, tiled row-wise softmax with program builder, tensor specs, PyTorch golden reference, compile/run entrypoint, CLI flags for `--sim`/device and pass dumping.
RMSNorm Example `examples/rms_norm.py`	New example implementing two-pass tiled RMSNorm (squared-sum reduction and normalization with gamma), with program builder, tensor specs, PyTorch golden reference, compile/run entrypoint, CLI flags for `--sim`/device and pass dumping.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as User/CLI
    participant Builder as ProgramBuilder
    participant Runtime as pypto.runtime
    participant Device as Simulator/Device
    participant Golden as PyTorch (golden)

    CLI->>Builder: build_*_program + build_tensor_specs
    Builder->>Runtime: provide program + specs + RunConfig
    Runtime->>Device: compile & execute (sim or device)
    Device-->>Runtime: execution outputs
    Runtime->>Golden: run golden reference
    Golden-->>Runtime: golden outputs
    Runtime->>CLI: compare results -> pass/fail

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Add CI skills, issue templates, and bump PTOAS to v0.8 #11: Also touches CI PTOAS pinning and checksum updates (closely related to the PTOAS bump).
Add a2a3 CI job for on-device hello_world test #15: Related CI changes that extend a2a3 job steps and example executions.
Add hello_world example and switch CI to sim #13: Adds simulator example tests and example programs to CI, overlapping with these example/CI additions.

Poem

🐰
Soft tiles hop, numbers align,
Rows and chunks in neat design.
From build to run, I dance and cheer,
Sim or device—results appear! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main changes: adding softmax and rms_norm examples and integrating them with CI, plus upgrading PTOAS to v0.9.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces two new examples, rms_norm.py and softmax.py, demonstrating row-wise and chunked implementations of RMSNorm and Softmax operations using the pypto library. The code is well-structured, clearly commented, and includes golden reference implementations and runnable test scripts. My review found one minor issue in rms_norm.py related to maintainability, where a hardcoded value should be replaced with a defined constant. Otherwise, the implementations appear correct and follow the existing patterns in the repository.

examples/rms_norm.py

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/rms_norm.py`:
- Around line 146-150: The skip message for missing code_runner in
examples/rms_norm.py prints correctly but leaves result.passed false so the
script still exits as a failure; modify the handling in the branches that check
result.passed and "code_runner" in result.error (both around the shown block and
the similar 162-167 block) to treat the skip as a success—after printing the
skip message set result.passed = True or otherwise return/exit with success so
the missing code_runner path does not produce a nonzero exit.
- Around line 34-42: The code computes hidden_blocks = hidden // hidden_chunk
and assumes both rows and hidden are exact multiples of row_chunk and
hidden_chunk, which can silently drop remainders and produce invalid slices;
update build_rms_norm_program to validate divisibility up front by checking
hidden % hidden_chunk == 0 and rows % row_chunk == 0 (and any other place you
compute blocks or slice tiles in the same module, e.g., the second pass that
slices [row_chunk, hidden_chunk]) and raise a clear ValueError (or assert) with
a descriptive message if not divisible so the function fails fast instead of
producing incorrect/invalid slices.

In `@examples/softmax.py`:
- Around line 28-32: The slice call in build_softmax_program assumes every tile
has exactly row_chunk rows (pl.slice(x, [row_chunk, cols], [r, 0])) which breaks
when rows % row_chunk != 0; change the tiling to compute tile_rows =
min(row_chunk, rows - r) for each tile (or add an upfront validation) and use
tile_rows instead of row_chunk in all pl.slice and related operations (and add a
tail-path if tile_rows < row_chunk) so the final partial tile is handled without
out-of-bounds slices; update every occurrence (including the similar slices
around the max/exp/reduction steps) to use tile_rows.
- Around line 122-127: The branch that detects a missing "code_runner" prints
"device run skipped" but leaves result.passed false, causing the CLI to treat
skipped runs as failures; update the branch that checks if not result.passed and
result.error and "code_runner" in result.error to mark the run as skipped
instead of failed by setting result.passed = True (or otherwise set a skipped
flag that the caller treats as success) and preserve the informative printout;
apply the same change to the analogous branch handling the other occurrence so
skipped runs don't exit with failure.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c990c2ee-9094-449b-a29f-c113428a2382

📥 Commits

Reviewing files that changed from the base of the PR and between d2c6e07 and c9f567a.

📒 Files selected for processing (3)

.github/workflows/ci.yml
examples/rms_norm.py
examples/softmax.py

examples/rms_norm.py

examples/softmax.py

- Add row-wise softmax example with numerical stability (max-shift) - Add two-pass RMSNorm example with column chunking and gamma weight - Add both examples to CI pipeline (sim + a2a3 device tests) - Upgrade PTOAS from v0.8 to v0.9 with updated checksums - Refactor setup_env skill to reference ci.yml for version and checksums

gemini-code-assist bot reviewed Mar 19, 2026

View reviewed changes

examples/rms_norm.py Show resolved Hide resolved

coderabbitai bot reviewed Mar 19, 2026

View reviewed changes

examples/rms_norm.py Show resolved Hide resolved

examples/rms_norm.py Show resolved Hide resolved

examples/softmax.py Show resolved Hide resolved

examples/softmax.py Show resolved Hide resolved

zhangqi-chen force-pushed the ci branch 2 times, most recently from 7cefa70 to 27fd87d Compare March 19, 2026 11:25

zhangqi-chen changed the title ~~Add softmax and rms_norm examples with CI integration~~ Add softmax and rms_norm examples with CI integration, upgrade PTOAS to v0.9 Mar 19, 2026

zhangqi-chen force-pushed the ci branch 2 times, most recently from 19c033e to f7e5d45 Compare March 19, 2026 11:32

zhangqi-chen force-pushed the ci branch from f7e5d45 to ee04b5a Compare March 20, 2026 02:08

zhangqi-chen merged commit e7d7724 into hw-native-sys:main Mar 20, 2026
4 checks passed

zhangqi-chen deleted the ci branch March 20, 2026 02:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add softmax and rms_norm examples with CI integration, upgrade PTOAS to v0.9#24

Add softmax and rms_norm examples with CI integration, upgrade PTOAS to v0.9#24
zhangqi-chen merged 1 commit intohw-native-sys:mainfrom
zhangqi-chen:ci

zhangqi-chen commented Mar 19, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

gemini-code-assist bot commented Mar 19, 2026

Uh oh!

coderabbitai bot commented Mar 19, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zhangqi-chen commented Mar 19, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Mar 19, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zhangqi-chen commented Mar 19, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 19, 2026 •

edited

Loading