Add softmax and rms_norm examples with CI integration, upgrade PTOAS to v0.9#24
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the examples library by introducing robust implementations of two fundamental neural network operations: softmax and RMSNorm. The new examples showcase advanced techniques like numerical stability for softmax and efficient two-pass column chunking for RMSNorm, which are crucial for large-scale models. Their integration into the CI pipeline ensures ongoing validation and correctness across different execution environments. Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds two new example programs (softmax and RMSNorm) with build/compile/run/golden-check logic, and updates CI to install PTOAS v0.9 and run those examples in both sim (x86_64) and a2a3 (aarch64) jobs, including sha256 verification in the sim job. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as User/CLI
participant Builder as ProgramBuilder
participant Runtime as pypto.runtime
participant Device as Simulator/Device
participant Golden as PyTorch (golden)
CLI->>Builder: build_*_program + build_tensor_specs
Builder->>Runtime: provide program + specs + RunConfig
Runtime->>Device: compile & execute (sim or device)
Device-->>Runtime: execution outputs
Runtime->>Golden: run golden reference
Golden-->>Runtime: golden outputs
Runtime->>CLI: compare results -> pass/fail
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces two new examples, rms_norm.py and softmax.py, demonstrating row-wise and chunked implementations of RMSNorm and Softmax operations using the pypto library. The code is well-structured, clearly commented, and includes golden reference implementations and runnable test scripts. My review found one minor issue in rms_norm.py related to maintainability, where a hardcoded value should be replaced with a defined constant. Otherwise, the implementations appear correct and follow the existing patterns in the repository.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/rms_norm.py`:
- Around line 146-150: The skip message for missing code_runner in
examples/rms_norm.py prints correctly but leaves result.passed false so the
script still exits as a failure; modify the handling in the branches that check
result.passed and "code_runner" in result.error (both around the shown block and
the similar 162-167 block) to treat the skip as a success—after printing the
skip message set result.passed = True or otherwise return/exit with success so
the missing code_runner path does not produce a nonzero exit.
- Around line 34-42: The code computes hidden_blocks = hidden // hidden_chunk
and assumes both rows and hidden are exact multiples of row_chunk and
hidden_chunk, which can silently drop remainders and produce invalid slices;
update build_rms_norm_program to validate divisibility up front by checking
hidden % hidden_chunk == 0 and rows % row_chunk == 0 (and any other place you
compute blocks or slice tiles in the same module, e.g., the second pass that
slices [row_chunk, hidden_chunk]) and raise a clear ValueError (or assert) with
a descriptive message if not divisible so the function fails fast instead of
producing incorrect/invalid slices.
In `@examples/softmax.py`:
- Around line 28-32: The slice call in build_softmax_program assumes every tile
has exactly row_chunk rows (pl.slice(x, [row_chunk, cols], [r, 0])) which breaks
when rows % row_chunk != 0; change the tiling to compute tile_rows =
min(row_chunk, rows - r) for each tile (or add an upfront validation) and use
tile_rows instead of row_chunk in all pl.slice and related operations (and add a
tail-path if tile_rows < row_chunk) so the final partial tile is handled without
out-of-bounds slices; update every occurrence (including the similar slices
around the max/exp/reduction steps) to use tile_rows.
- Around line 122-127: The branch that detects a missing "code_runner" prints
"device run skipped" but leaves result.passed false, causing the CLI to treat
skipped runs as failures; update the branch that checks if not result.passed and
result.error and "code_runner" in result.error to mark the run as skipped
instead of failed by setting result.passed = True (or otherwise set a skipped
flag that the caller treats as success) and preserve the informative printout;
apply the same change to the analogous branch handling the other occurrence so
skipped runs don't exit with failure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c990c2ee-9094-449b-a29f-c113428a2382
📒 Files selected for processing (3)
.github/workflows/ci.ymlexamples/rms_norm.pyexamples/softmax.py
7cefa70 to
27fd87d
Compare
19c033e to
f7e5d45
Compare
- Add row-wise softmax example with numerical stability (max-shift) - Add two-pass RMSNorm example with column chunking and gamma weight - Add both examples to CI pipeline (sim + a2a3 device tests) - Upgrade PTOAS from v0.8 to v0.9 with updated checksums - Refactor setup_env skill to reference ci.yml for version and checksums
Summary
Testing
python examples/softmax.py --simpassespython examples/rms_norm.py --simpassesSummary by CodeRabbit
New Features
Chores