Migrate `set_shape` recognition from Java IR-scan to `<method>` on Tensor classes in `tensorflow.xml`

## Motivation

`PythonTensorAnalysisEngine.getSetShapeCallsSyntactic` (introduced in [ponder-lab/ML#333](https://github.com/ponder-lab/ML/pull/333) as a near-term fix for wala/ML#509) recognizes `x.set_shape(s)` callsites by scanning every CGNode's IR for the `PythonPropertyRead("set_shape")` + invoke pattern. This works but lives in Java, separate from the existing TF API modeling layer (`tensorflow.xml`).

The natural home for the recognition is `tensorflow.xml`, in line with how `Dataset` already models its instance methods.

## Investigation Findings (2026-05-26)

A draft PR ([ponder-lab/ML#338](https://github.com/ponder-lab/ML/pull/338), now closed) explored declaring `<method name="set_shape">` directly on each Tensor/SparseTensor `<class>` block. Two structural problems surfaced:

1. **The trampoline indirection.** Declaring `<method>` on a `<class>` triggers WALA's `PythonInstanceMethodTrampolineTargetSelector`. User code's `tensor.set_shape(shape)` invokes a TRAMPOLINE (`L$<class>/set_shape.trampoline<N>()LRoot;`, with the `$` prefix added at `PythonInstanceMethodTrampolineTargetSelector.java:239` and the `trampoline<numTotalParameters>` selector per `PythonMethodTrampolineTargetSelector.java:50`), which then dispatches to the underlying `do()`. The legacy `getShapeSourceCalls` machinery either walks the wrong direction (targeting `do()` finds the trampoline, not user code) or, if pointed at the trampoline, can't recover the receiver because the trampoline's auto-generated body doesn't preserve the def-to-receiver aliasing that the legacy callable-as-attribute pattern relied on.

2. **Receiver vs. def semantics.** `set_shape`'s purpose is to MUTATE the receiver's tensor classification. `getShapeSourceCalls` pins `call.getDef()` (the call's return value). The LEGACY mechanism worked because `<class name="set_shape">.do()` had `<return value="self"/>` — in the callable-as-attribute context, `self` referred to the callable instance, aliasing through PA to the receiver. With the trampoline indirection, that aliasing path is broken.

## Architecturally-Consistent Alternative: Dataset Pattern

The `Dataset` class doesn't declare its instance methods (`batch`, `map`, `shuffle`, etc.) as `<method>` blocks inside `<class name="Dataset">`. Instead, each is a STANDALONE callable class (`<class name="batch">`, `<class name="shuffle">`, etc.), and Dataset instances have these callables attached via `<putfield>` at every Dataset-producing endpoint (`from_tensor_slices.do`, `shuffle.do`, etc.).

This bypasses the trampoline entirely: PropertyRead-based attribute access on a putfield-attached callable invokes the callable's `do()` directly, no trampoline.

The legacy `set_shape` already uses this pattern on `FixedLenFeature` (`tensorflow.xml:2129`: `<putfield field="set_shape" ref="x" value="set_shape_callable"/>`). Extending the pattern to every Tensor allocation site would resolve wala/ML#550 without touching the trampoline mechanism.

## Proposed Migration

1. **Keep** the existing `<class name="set_shape">` callable at `tensorflow.xml:1660`.
2. **Add** `<putfield ref="x" field="set_shape" value="set_shape_callable"/>` to every `<new def="x" class="Ltensorflow/.../Tensor"/>` (and SparseTensor) site in `tensorflow.xml`. Estimated count: ~130 Tensor allocations.
3. **Update** `PythonTensorAnalysisEngine` to remove `getSetShapeCallsSyntactic` and restore the legacy `getShapeSourceCalls(set_shape, ...)` call now that every Tensor allocation has the attribute attached.
4. **Remove** the `cast` `pass_through` alias (already done by ponder-lab/ML#333).

## Cost / Trade-Off

- **Volume**: ~130 `<putfield>` sites is significant. Same per-allocation duplication pattern wala/ML#549 audited for Dataset/Model field blocks — proven structurally load-bearing, accepted as the current XML convention.
- **Mechanism**: identical to Dataset's existing approach; no new architectural pattern.
- **Future collapse**: once class-hierarchy resolution (wala/ML#118, wala/ML#107) lands, the 130 putfields collapse to a single declaration on a base Tensor class. This issue becomes the consumer of those frontend fixes.

## Related

- [ponder-lab/ML#333](https://github.com/ponder-lab/ML/pull/333) — the near-term Java IR-syntactic implementation; ships first.
- [ponder-lab/ML#338](https://github.com/ponder-lab/ML/pull/338) (closed) — the architectural exploration that surfaced the trampoline indirection.
- wala/ML#549 — prior audit accepting the per-allocation duplication pattern.
- wala/ML#118, wala/ML#107 — class-hierarchy resolution; the collapse path.
- wala/ML#555 — `TensorType.shapeArg`'s spurious `throws IOException` declaration (surfaced during PR #333 review; orthogonal cleanup).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate `set_shape` recognition from Java IR-scan to `<method>` on Tensor classes in `tensorflow.xml` #550

Motivation

Investigation Findings (2026-05-26)

Architecturally-Consistent Alternative: Dataset Pattern

Proposed Migration

Cost / Trade-Off

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Migrate set_shape recognition from Java IR-scan to <method> on Tensor classes in tensorflow.xml #550

Description

Motivation

Investigation Findings (2026-05-26)

Architecturally-Consistent Alternative: Dataset Pattern

Proposed Migration

Cost / Trade-Off

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Migrate `set_shape` recognition from Java IR-scan to `<method>` on Tensor classes in `tensorflow.xml` #550