Skip to content

Add softmax and rms_norm examples with CI integration, upgrade PTOAS to v0.9#24

Merged
zhangqi-chen merged 1 commit intohw-native-sys:mainfrom
zhangqi-chen:ci
Mar 20, 2026
Merged

Add softmax and rms_norm examples with CI integration, upgrade PTOAS to v0.9#24
zhangqi-chen merged 1 commit intohw-native-sys:mainfrom
zhangqi-chen:ci

Conversation

@zhangqi-chen
Copy link
Collaborator

@zhangqi-chen zhangqi-chen commented Mar 19, 2026

Summary

  • Add row-wise softmax example with numerical stability (max-shift)
  • Add two-pass RMSNorm example with column chunking and gamma weight
  • Add both examples to CI pipeline (sim + a2a3 device tests)

Testing

  • python examples/softmax.py --sim passes
  • python examples/rms_norm.py --sim passes
  • CI sim and a2a3 jobs pass

Summary by CodeRabbit

  • New Features

    • Added softmax and RMSNorm example programs showcasing row-wise softmax and RMS normalization workflows and reference checks.
  • Chores

    • Updated CI to use PTOAS v0.9 and extended automated validation to run the new examples on simulator and target device; added download integrity verification in simulation job.

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the examples library by introducing robust implementations of two fundamental neural network operations: softmax and RMSNorm. The new examples showcase advanced techniques like numerical stability for softmax and efficient two-pass column chunking for RMSNorm, which are crucial for large-scale models. Their integration into the CI pipeline ensures ongoing validation and correctness across different execution environments.

Highlights

  • New Softmax Example: Added a row-wise softmax example that incorporates numerical stability through max-shift.
  • New RMSNorm Example: Introduced a two-pass RMSNorm example featuring column chunking and gamma weight application, designed for handling large hidden dimensions.
  • CI Integration: Integrated both the new softmax and RMSNorm examples into the CI pipeline, including both simulation and a2a3 device tests.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/ci.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Mar 19, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds two new example programs (softmax and RMSNorm) with build/compile/run/golden-check logic, and updates CI to install PTOAS v0.9 and run those examples in both sim (x86_64) and a2a3 (aarch64) jobs, including sha256 verification in the sim job.

Changes

Cohort / File(s) Summary
CI Pipeline
.github/workflows/ci.yml
Bump PTOAS to v0.9 (update SHA256s), add sha256sum -c - verification in sim, and add examples/softmax.py and examples/rms_norm.py execution steps to both sim and a2a3 jobs alongside existing tests.
Softmax Example
examples/softmax.py
New example implementing numerically stable, tiled row-wise softmax with program builder, tensor specs, PyTorch golden reference, compile/run entrypoint, CLI flags for --sim/device and pass dumping.
RMSNorm Example
examples/rms_norm.py
New example implementing two-pass tiled RMSNorm (squared-sum reduction and normalization with gamma), with program builder, tensor specs, PyTorch golden reference, compile/run entrypoint, CLI flags for --sim/device and pass dumping.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as User/CLI
    participant Builder as ProgramBuilder
    participant Runtime as pypto.runtime
    participant Device as Simulator/Device
    participant Golden as PyTorch (golden)

    CLI->>Builder: build_*_program + build_tensor_specs
    Builder->>Runtime: provide program + specs + RunConfig
    Runtime->>Device: compile & execute (sim or device)
    Device-->>Runtime: execution outputs
    Runtime->>Golden: run golden reference
    Golden-->>Runtime: golden outputs
    Runtime->>CLI: compare results -> pass/fail
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰
Soft tiles hop, numbers align,
Rows and chunks in neat design.
From build to run, I dance and cheer,
Sim or device—results appear! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: adding softmax and rms_norm examples and integrating them with CI, plus upgrading PTOAS to v0.9.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two new examples, rms_norm.py and softmax.py, demonstrating row-wise and chunked implementations of RMSNorm and Softmax operations using the pypto library. The code is well-structured, clearly commented, and includes golden reference implementations and runnable test scripts. My review found one minor issue in rms_norm.py related to maintainability, where a hardcoded value should be replaced with a defined constant. Otherwise, the implementations appear correct and follow the existing patterns in the repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/rms_norm.py`:
- Around line 146-150: The skip message for missing code_runner in
examples/rms_norm.py prints correctly but leaves result.passed false so the
script still exits as a failure; modify the handling in the branches that check
result.passed and "code_runner" in result.error (both around the shown block and
the similar 162-167 block) to treat the skip as a success—after printing the
skip message set result.passed = True or otherwise return/exit with success so
the missing code_runner path does not produce a nonzero exit.
- Around line 34-42: The code computes hidden_blocks = hidden // hidden_chunk
and assumes both rows and hidden are exact multiples of row_chunk and
hidden_chunk, which can silently drop remainders and produce invalid slices;
update build_rms_norm_program to validate divisibility up front by checking
hidden % hidden_chunk == 0 and rows % row_chunk == 0 (and any other place you
compute blocks or slice tiles in the same module, e.g., the second pass that
slices [row_chunk, hidden_chunk]) and raise a clear ValueError (or assert) with
a descriptive message if not divisible so the function fails fast instead of
producing incorrect/invalid slices.

In `@examples/softmax.py`:
- Around line 28-32: The slice call in build_softmax_program assumes every tile
has exactly row_chunk rows (pl.slice(x, [row_chunk, cols], [r, 0])) which breaks
when rows % row_chunk != 0; change the tiling to compute tile_rows =
min(row_chunk, rows - r) for each tile (or add an upfront validation) and use
tile_rows instead of row_chunk in all pl.slice and related operations (and add a
tail-path if tile_rows < row_chunk) so the final partial tile is handled without
out-of-bounds slices; update every occurrence (including the similar slices
around the max/exp/reduction steps) to use tile_rows.
- Around line 122-127: The branch that detects a missing "code_runner" prints
"device run skipped" but leaves result.passed false, causing the CLI to treat
skipped runs as failures; update the branch that checks if not result.passed and
result.error and "code_runner" in result.error to mark the run as skipped
instead of failed by setting result.passed = True (or otherwise set a skipped
flag that the caller treats as success) and preserve the informative printout;
apply the same change to the analogous branch handling the other occurrence so
skipped runs don't exit with failure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c990c2ee-9094-449b-a29f-c113428a2382

📥 Commits

Reviewing files that changed from the base of the PR and between d2c6e07 and c9f567a.

📒 Files selected for processing (3)
  • .github/workflows/ci.yml
  • examples/rms_norm.py
  • examples/softmax.py

@zhangqi-chen zhangqi-chen force-pushed the ci branch 2 times, most recently from 7cefa70 to 27fd87d Compare March 19, 2026 11:25
@zhangqi-chen zhangqi-chen changed the title Add softmax and rms_norm examples with CI integration Add softmax and rms_norm examples with CI integration, upgrade PTOAS to v0.9 Mar 19, 2026
@zhangqi-chen zhangqi-chen force-pushed the ci branch 2 times, most recently from 19c033e to f7e5d45 Compare March 19, 2026 11:32
- Add row-wise softmax example with numerical stability (max-shift)
- Add two-pass RMSNorm example with column chunking and gamma weight
- Add both examples to CI pipeline (sim + a2a3 device tests)
- Upgrade PTOAS from v0.8 to v0.9 with updated checksums
- Refactor setup_env skill to reference ci.yml for version and checksums
@zhangqi-chen zhangqi-chen merged commit e7d7724 into hw-native-sys:main Mar 20, 2026
4 checks passed
@zhangqi-chen zhangqi-chen deleted the ci branch March 20, 2026 02:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant