Skip to content

refactor(codegen): use Out/InOut params for orchestration output tensors#620

Merged
lyfne123 merged 4 commits intohw-native-sys:mainfrom
YunjiQin:orch
Mar 19, 2026
Merged

refactor(codegen): use Out/InOut params for orchestration output tensors#620
lyfne123 merged 4 commits intohw-native-sys:mainfrom
YunjiQin:orch

Conversation

@YunjiQin
Copy link
Contributor

@YunjiQin YunjiQin commented Mar 19, 2026

Summary

  • Refactor orchestration codegen to derive output tensors from Out/InOut function parameters instead of inferring them from return statements
  • Support incore call return tensors using different names with args.
  • Update all examples and tests to use pl.Out[...] parameter syntax for output tensors
  • Remove dead fields from OrchestrationInfoCollector that were no longer read after the refactor

Testing

  • All 2558 tests pass
  • Code review completed
  • Pre-commit hooks pass (clang-format, cpplint, ruff, pyright)

Related Issues

fix #583

@coderabbitai
Copy link

coderabbitai bot commented Mar 19, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Refactors orchestration codegen to stop creating local return tensors and instead emit reference aliases for kernel outputs; updates many examples and tests to accept external output handles (pl.Out / pl.InOut) and adds comprehensive orchestration-codegen documentation.

Changes

Cohort / File(s) Summary
Documentation (EN & ZH)
docs/en/dev/codegen/02-orchestration_codegen.md, docs/zh-cn/dev/codegen/02-orchestration_codegen.md
New comprehensive docs describing orchestration codegen architecture, phases, internals (collector, stmt codegen, op registry), examples and Python API notes.
Core Orchestration Codegen
src/codegen/orchestration/orchestration_codegen.cpp
Removed helpers for counting/inferring return tensors and expected args; simplified OrchestrationInfoCollector to tuple metadata only; changed AssignStmt emission to create C++ Tensor& aliases from callee Out/InOut params; removed orchestration-local return-tensor creation logic.
IR-parser examples
examples/ir_parser/...
examples/ir_parser/batch_paged_attention_example.py, .../orchestration_example.py, .../paged_attention_example.py, .../vector_example_dag.py
Switched orchestration signatures to accept pl.Out[...] outputs and removed internal pl.create_tensor allocations; removed unused size_* scalar params in paged attention.
Language examples (beginner/intermediate/llm)
examples/language/beginner/..., examples/language/intermediate/..., examples/language/llm_models/llama_7b_mini.py
Many orchestrator functions updated to take pl.Out[...] outputs (or pl.InOut[...] where in-place), removing internal output allocations and forwarding caller-provided buffers into kernels.
System tests (codegen & runtime)
tests/st/codegen/..., tests/st/runtime/...
Test programs updated to reflect external output parameters (pl.Out/pl.InOut) and removed local pl.create_tensor allocations; paged-attention test tensor specs for size_* removed.
Unit tests (codegen)
tests/ut/codegen/test_orchestration_codegen.py
Multiple orchestration unit tests updated to expect external pl.Out / pl.InOut parameters; local create_tensor calls removed; tuple/in-place test signatures adjusted accordingly.

Sequence Diagram(s)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • lyfne123
  • Hzfengsy

Poem

"🐰 I hopped through code and found a way,
To let outputs leap out into the day.
No more creating inside the nest,
Kernels write where callers request.
Aliases bind, the orchestration sings—hooray!"

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.16% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and accurately summarizes the main refactoring: changing orchestration codegen to use Out/InOut parameters for output tensors instead of inferring from returns.
Description check ✅ Passed The PR description accurately describes the changeset: refactoring orchestration codegen to use Out/InOut parameters for output tensors, updating examples/tests, and removing dead code.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the orchestration code generation process by shifting from implicit inference of output tensors to explicit declaration using pl.Out and pl.InOut type annotations in Python function parameters. This change improves the robustness and readability of the orchestration IR, making the intent of output tensors clear at the function signature level. The underlying C++ code generation logic has been simplified, and internal metadata collection mechanisms have been streamlined to align with this more explicit approach.

Highlights

  • Refactored Orchestration Codegen: The orchestration code generation now explicitly derives output tensors from Out/InOut function parameters, enhancing clarity and simplifying the codegen logic.
  • Updated Python Syntax: All relevant examples and tests have been updated to utilize the new pl.Out[...] and pl.InOut[...] parameter syntax for declaring output tensors in Python orchestration functions.
  • Streamlined Metadata Collection: The OrchestrationInfoCollector has been optimized by removing fields previously used for inferring output tensors from return statements, as this information is now explicitly provided via parameter annotations.
  • C++ Alias Generation for InCore Calls: The codegen now emits C++ reference aliases for InCore call return values when their names differ from the corresponding Out/InOut arguments, ensuring correct variable mapping.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/st/runtime/test_ctrl_flow.py (1)

524-532: ⚠️ Potential issue | 🟡 Minor

Update orchestrator functions for consistency or add explanatory comment.

The three test cases (TestForLoopBreak, TestForLoopContinue, and TestForLoopBreakContinue) use pl.create_tensor() to allocate output tensors in their orchestrator functions, while all other test cases in this file use pl.Out[...] parameters. Additionally, the InCore kernel functions within these same test cases use pl.Out[...], creating an inconsistency within each test class.

Either update these orchestrator functions to use pl.Out[...] for consistency, or add a brief comment explaining why this pattern differs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/st/runtime/test_ctrl_flow.py` around lines 524 - 532, The orchestrator
functions (e.g., orchestrator in
TestForLoopBreak/TestForLoopContinue/TestForLoopBreakContinue) are inconsistent
with the rest of the file by allocating outputs with pl.create_tensor rather
than taking pl.Out[...] parameters while their kernels (kernel_break,
kernel_continue, etc.) use pl.Out; update each orchestrator signature to accept
the output tensor as a pl.Out[[256, 64], pl.FP32] parameter and wire that
through when calling the corresponding kernel (e.g., pass the pl.Out c into
kernel_break), or if create_tensor is intentional, add a short explanatory
comment above each orchestrator explaining why create_tensor is required for
these tests to justify the deviation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tests/st/runtime/test_ctrl_flow.py`:
- Around line 524-532: The orchestrator functions (e.g., orchestrator in
TestForLoopBreak/TestForLoopContinue/TestForLoopBreakContinue) are inconsistent
with the rest of the file by allocating outputs with pl.create_tensor rather
than taking pl.Out[...] parameters while their kernels (kernel_break,
kernel_continue, etc.) use pl.Out; update each orchestrator signature to accept
the output tensor as a pl.Out[[256, 64], pl.FP32] parameter and wire that
through when calling the corresponding kernel (e.g., pass the pl.Out c into
kernel_break), or if create_tensor is intentional, add a short explanatory
comment above each orchestrator explaining why create_tensor is required for
these tests to justify the deviation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5a2ac031-bf8d-4d67-a831-dc874c631271

📥 Commits

Reviewing files that changed from the base of the PR and between 95925ed and 1f94b46.

📒 Files selected for processing (25)
  • docs/en/dev/codegen/02-orchestration_codegen.md
  • docs/zh-cn/dev/codegen/02-orchestration_codegen.md
  • examples/ir_parser/batch_paged_attention_example.py
  • examples/ir_parser/orchestration_example.py
  • examples/ir_parser/paged_attention_example.py
  • examples/ir_parser/vector_example_dag.py
  • examples/language/beginner/basic_ops.py
  • examples/language/beginner/elementwise.py
  • examples/language/beginner/hello_world.py
  • examples/language/beginner/matmul.py
  • examples/language/intermediate/activation.py
  • examples/language/intermediate/ffn_activations.py
  • examples/language/intermediate/layer_norm.py
  • examples/language/intermediate/rms_norm.py
  • examples/language/intermediate/softmax.py
  • examples/language/intermediate/vector_dag.py
  • examples/language/llm_models/llama_7b_mini.py
  • src/codegen/orchestration/orchestration_codegen.cpp
  • tests/st/codegen/test_batch_paged_attention.py
  • tests/st/codegen/test_paged_attention.py
  • tests/st/runtime/test_ctrl_flow.py
  • tests/st/runtime/test_dynamic_shape.py
  • tests/st/runtime/test_fillpad.py
  • tests/st/runtime/test_matmul.py
  • tests/ut/codegen/test_orchestration_codegen.py

@gemini-code-assist
Copy link
Contributor

Warning

Gemini is experiencing higher than usual traffic and was unable to create the review. Please try again in a few hours by commenting /gemini review.

Replace return-var inference with explicit pl.Out/pl.InOut parameter
annotations in orchestration functions. Output tensors are now passed
as params rather than allocated via pl.create_tensor() in the body.

C++ codegen removes dead code: CountReturnTensors, CountExpectedArgs,
GetIntermediateTensorType, return_vars tracking, and return_names_
member. Adds alias generation for InCore call return values that map
to Out/InOut args.
…t params

Migrate output tensor declarations from local pl.create_tensor() to
explicit pl.Out[...] parameters across all examples and tests, aligning
with the Out/InOut orchestration codegen refactor.
Remove output_tensors, output_tensor_assigns, tuple_element_map, and
call_to_result_var which were populated but never read after the
Out/InOut param refactor. Also remove the no-op SetTupleElementMap
method and its call site.
@YunjiQin
Copy link
Contributor Author

@lyfne123

@lyfne123 lyfne123 merged commit e7271b2 into hw-native-sys:main Mar 19, 2026
7 checks passed
@YunjiQin YunjiQin deleted the orch branch March 19, 2026 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Generated orchestration uses undeclared source tensor for view on incore result

2 participants