Skip to content

Refactor TensorMap to per-ring buckets and pools#310

Open
jvjhfhg wants to merge 1 commit intohw-native-sys:mainfrom
jvjhfhg:refactor-tensormap
Open

Refactor TensorMap to per-ring buckets and pools#310
jvjhfhg wants to merge 1 commit intohw-native-sys:mainfrom
jvjhfhg:refactor-tensormap

Conversation

@jvjhfhg
Copy link
Collaborator

@jvjhfhg jvjhfhg commented Mar 17, 2026

  • refactor owner TensorMap storage into per-ring buckets, pools, and cleanup tracking
  • add a fallback tensormap for external tensors and cross-ring INOUT modifiers while keeping owner-ring history ring-local
  • route lookup and removal across owner and fallback sources and bind make_tensor() to the current scope ring
  • update paged attention to treat oi_batch as INOUT in the example and matching device test

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the TensorMap implementation to improve performance and isolation by introducing a per-ring architecture. Instead of a single global TensorMap, each processing ring now maintains its own independent set of TensorMap data structures. This change enables more efficient cleanup and lookup operations, particularly through a "break-on-stale" optimization, and better manages tensor dependencies across different rings. The orchestrator has been updated to correctly assign ring IDs to tensors, ensuring proper routing within the new TensorMap structure.

Highlights

  • Per-Ring TensorMap Structure: Introduced PTO2TensorMapRing to manage independent buckets, entry pools, free lists, and task entry heads for each processing ring, enhancing isolation and performance.
  • TensorMap Facade: Refactored PTO2TensorMap to act as a facade, routing all operations (lookup, insert, cleanup) to the appropriate PTO2TensorMapRing instance based on the tensor's ring_id.
  • Tensor Ring ID: Added a uint8_t ring_id field to the Tensor struct, allowing tensors to be explicitly associated with a specific ring, with TENSOR_RING_ID_NONE indicating an unassigned or external tensor.
  • Orchestrator Ring ID Assignment: Modified the orchestrator to automatically assign the current ring's ring_id to OUTPUT tensors and unassigned INOUT tensors during task submission.
  • Break-on-Stale Optimization: Enabled a 'break-on-stale' optimization within PTO2TensorMapRing lookup chains, which leverages the newest-first ordering of entries to stop searching a chain as soon as a stale entry is encountered.
  • Decentralized Cleanup Tracking: Moved the tensormap_last_cleanup tracking variable from the orchestrator's global state into each individual PTO2TensorMapRing instance, further decentralizing and isolating ring-specific state.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp
    • Removed initialization of orch->tensormap_last_cleanup.
    • Added logic to assign ring_id to INOUT tensors if unassigned, and to OUTPUT tensors.
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.h
    • Removed tensormap_last_cleanup member from PTO2OrchestratorState.
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.cpp
    • Renamed PTO2TensorMap to PTO2TensorMapRing and updated its init, destroy, print_stats, valid_count, lookup, insert, and cleanup_retired methods to operate on a single ring.
    • Created a new PTO2TensorMap struct as a facade, containing an array of PTO2TensorMapRing instances and routing calls based on tensor.ring_id.
    • Modified sync_tensormap to iterate through all PTO2TensorMapRing instances and perform per-ring validity synchronization and cleanup.
    • Implemented the "break-on-stale" optimization in PTO2TensorMapRing::lookup.
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.h
    • Introduced PTO2TensorMapRing struct with per-ring specific members like buckets, entry_pool, free_entry_list, task_entry_heads, task_window_size, last_task_alive, and last_cleanup.
    • Updated PTO2TensorMap to be a facade, holding an array of PTO2TensorMapRing objects.
    • Adjusted comments to reflect the new per-ring design and the "break-on-stale" optimization.
    • Removed ring_id parameter from sync_validity and cleanup_retired in PTO2TensorMapRing as it now operates on its own ring.
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/tensor.h
    • Added uint8_t ring_id field to the Tensor struct.
    • Defined TENSOR_RING_ID_NONE constant.
    • Updated Tensor constructors and print method to include the new ring_id field.
Activity
  • No specific activity (comments, reviews, progress updates) was provided in the context for this pull request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-executed refactoring of TensorMap to a per-ring architecture. By creating PTO2TensorMapRing for per-ring resource management and making PTO2TensorMap a facade, the design improves isolation and enables a key performance optimization in lookup with break-on-stale logic. The changes are consistent and robust across all modified files. My review includes a couple of minor suggestions to simplify memory allocation and initialization by using calloc.

Comment on lines +40 to 44
buckets = (PTO2TensorMapEntry**)malloc(new_num_buckets * sizeof(PTO2TensorMapEntry*));
if (!buckets) return false;
for (int32_t i = 0; i < new_num_buckets; i++) {
buckets[i] = nullptr;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For simplicity and to ensure zero-initialization, you can replace malloc and the subsequent loop with a single call to calloc.

Suggested change
buckets = (PTO2TensorMapEntry**)malloc(new_num_buckets * sizeof(PTO2TensorMapEntry*));
if (!buckets) return false;
for (int32_t i = 0; i < new_num_buckets; i++) {
buckets[i] = nullptr;
}
buckets = (PTO2TensorMapEntry**)calloc(new_num_buckets, sizeof(PTO2TensorMapEntry*));
if (!buckets) return false;

Comment on lines +74 to 83
task_entry_heads = (PTO2TensorMapEntry**)malloc(new_task_window_size * sizeof(PTO2TensorMapEntry*));
if (!task_entry_heads) {
free(buckets); buckets = nullptr;
free(entry_pool); entry_pool = nullptr;
free(free_entry_list); free_entry_list = nullptr;
return false;
}

for (int r = 0; r < PTO2_MAX_RING_DEPTH; r++) {
last_task_alives[r] = 0;
for (int32_t i = 0; i < new_task_window_size; i++) {
task_entry_heads[i] = nullptr;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the buckets allocation, you can use calloc here to both allocate and zero-initialize the task_entry_heads array, which simplifies the code.

Suggested change
task_entry_heads = (PTO2TensorMapEntry**)malloc(new_task_window_size * sizeof(PTO2TensorMapEntry*));
if (!task_entry_heads) {
free(buckets); buckets = nullptr;
free(entry_pool); entry_pool = nullptr;
free(free_entry_list); free_entry_list = nullptr;
return false;
}
for (int r = 0; r < PTO2_MAX_RING_DEPTH; r++) {
last_task_alives[r] = 0;
for (int32_t i = 0; i < new_task_window_size; i++) {
task_entry_heads[i] = nullptr;
}
task_entry_heads = (PTO2TensorMapEntry**)calloc(new_task_window_size, sizeof(PTO2TensorMapEntry*));
if (!task_entry_heads) {
free(buckets); buckets = nullptr;
free(entry_pool); entry_pool = nullptr;
free(free_entry_list); free_entry_list = nullptr;
return false;
}

bool is_raw_eq_shapes = false,
bool manual_dep = false) {
bool manual_dep = false,
uint8_t in_ring_id = TENSOR_RING_ID_NONE) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tensor的ring_id必须设置,tensor不能在scope A内定义,然后让scope A里的scope B里的task分配内存,所以在make_tensor的时候就要指定ring(等于当前scope的ring_id),view操作继承。submit_task发现tensor的type是output的时候,task的ring_id必须和tensor的ring_id一致,否则要报错。

// Per-ring chain: entries are ordered newest-first (head-insert).
// A stale entry means all subsequent entries are also stale — break.
if (!entry_valid(*cur_entry)) {
cur_entry = next_entry;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里需要还原为老的逻辑,把next置为nullptr,是否立即进行entry清理再看。

@jvjhfhg jvjhfhg force-pushed the refactor-tensormap branch 2 times, most recently from d00c195 to ec51fd1 Compare March 18, 2026 09:52
- refactor owner TensorMap storage into per-ring buckets, pools,
  and cleanup tracking
- add a fallback tensormap for external tensors and cross-ring
  INOUT modifiers while keeping owner-ring history ring-local
- route lookup and removal across owner and fallback sources
  and bind make_tensor() to the current scope ring
- update paged attention to treat oi_batch as INOUT in the
  example and matching device test
@jvjhfhg jvjhfhg force-pushed the refactor-tensormap branch from ec51fd1 to 0ef4a89 Compare March 18, 2026 11:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants