Skip to content

Restore per-context layer-output shape via configurable selective k-CFA#371

Merged
khatchad merged 10 commits into
masterfrom
ml-530-layer-output-shape
Jun 10, 2026
Merged

Restore per-context layer-output shape via configurable selective k-CFA#371
khatchad merged 10 commits into
masterfrom
ml-530-layer-output-shape

Conversation

@khatchad

@khatchad khatchad commented Jun 9, 2026

Copy link
Copy Markdown
Member

Summary

A user tf.keras.Model subclass invoked from multiple sites (e.g. a training loop and a test-set evaluation) merged into one context-insensitive call-graph node, collapsing its per-caller layer-output allocations and losing per-context tensor shape. Downstream, tf.argmax of a collapsed layer output leaked one caller's shape into another context and threw a NonBroadcastableShapesException in tf.equal, which is why Argmax.getDefaultShapes returned ⊤ to sidestep the regression.

This PR resolves that by making the targeted k-CFA depth configurable (wala#379) and applying deeper context to user model forward methods (wala#530), then restoring the precise tf.argmax/tf.argmin output shape.

Changes

  • Configurable depth (Make k-CFA depth configurable for framework API context sensitivity wala/ML#379): the hard-coded 2 on the targeted context selector becomes DEFAULT_TARGETED_CFA_DEPTH (default 2, fully back-compat), exposed as a PythonTensorAnalysisEngine constructor parameter. Clients supply a deeper value when allocations merge across call sites; only the targeted selector (framework APIs plus model methods) uses it, never the whole program.
  • Model-aware context (Restore precise output shape on per-context layer-output allocations (testNeuralNetwork style) wala/ML#530): the targeted predicate is extended to user tf.keras.Model forward methods so a model called from multiple sites is analyzed per caller, keeping its layer-output allocations distinct.
  • Precise tf.argmax/tf.argmin shape: Argmax.getDefaultShapes now delegates to ReduceMean's keepdims=false reduction (input shape with the axis dimension removed), since the per-context separation removes the broadcast regression that previously forced ⊤.
  • Test harness: depth-parameterized makeEngine/test overloads (mirroring the analogous knob in the Java streams tooling), so testNeuralNetwork1-4 opt into depth 4 while every other test stays at the default.

Detection: Heuristic vs CHA-subtype (Dormant Path)

The clean way to recognize a user model method is a class-hierarchy subtype check against tf.keras.Model. That check is wired but intentionally dormant: the front-end does not yet record class X(Model) inheritance, so a user model class resolves its superclass to Lobject rather than the Keras Model type (filed as wala#571, an existing front-end TODO). Until that lands, a structural heuristic over user call/__call__/predict methods drives the predicate. When wala#571 is resolved, the heuristic is deleted and isModelSubclassMethod takes over—both are present so the switch is a one-line change.

Testing

  • Full suite green (all modules, mvn test).
  • testNeuralNetwork1-4 run at depth 4: accuracy's y_pred is now the correct per-context union {(256, 10) float32, ? float32}, and the per-context multiplicity for testNeuralNetwork rises to 24.
  • The six tf.argmax/tf.argmin shape tests tighten from ⊤ to the precise (3,) the fixtures assert at runtime.

Closes wala#379.
Closes wala#530.

khatchad and others added 2 commits June 9, 2026 17:31
Lift the hard-coded 2-CFA depth on the targeted context selector to a
configurable `DEFAULT_TARGETED_CFA_DEPTH` (default 2, back-compat), exposed
as a `PythonTensorAnalysisEngine` constructor parameter so clients can
request deeper context when allocations merge across call sites.

Extend the targeted predicate beyond framework APIs to user `tf.keras.Model`
forward methods: a model invoked from multiple sites (train vs. test)
otherwise merges into one context-insensitive node, collapsing its
per-caller layer-output allocations and losing per-context shape. The
CHA-subtype check is wired but dormant because the front-end does not yet
record `class X(Model)` inheritance (superclass resolves to `Lobject`,
wala#571); a structural heuristic over user `call`/`__call__`/`predict`
methods is active until that lands.

The test harness gains depth-parameterized `makeEngine`/`test` overloads,
mirroring the Java streams tooling, so `testNeuralNetwork1-4` opt into
depth 4 while every other test stays at the default.

Closes wala#379.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
With per-context layer-output allocations now separated by the targeted
k-CFA depth, un-nulling `Argmax.getDefaultShapes` no longer regresses
`testNeuralNetwork*` via `ElementWiseOperation`'s cross-context cartesian
pair. `Argmax`/`Argmin` delegate to `ReduceMean`'s keepdims=false reduction,
emitting the input shape with the `axis` dimension removed.

The six argmax/argmin shape tests tighten from ⊤ to the precise `(3,)` shape
the fixtures assert at runtime, resolving the captured-imprecise TODOs that
were gated on this fix.

Closes wala#530.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 9, 2026 21:32

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes the call-graph context sensitivity for targeted framework/model methods configurable to restore per-caller tensor-shape precision (notably for user tf.keras.Model subclasses invoked from multiple sites), and then tightens tf.argmax/tf.argmin shape inference back to the precise reduced shape.

Changes:

  • Add PythonTensorAnalysisEngine.DEFAULT_TARGETED_CFA_DEPTH (default 2) and constructor parameters to configure the targeted k-CFA depth.
  • Extend the “targeted” predicate to include user model forward methods (via CHA-subtype check with a structural heuristic fallback).
  • Restore precise argmax/argmin output shapes and update tests to use deeper k-CFA (depth 4) for the affected neural-network fixtures.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
com.ibm.wala.cast.python.ml/source/com/ibm/wala/cast/python/ml/client/PythonTensorAnalysisEngine.java Introduces configurable targeted k-CFA depth and expands targeted method selection to cover user model forward methods.
com.ibm.wala.cast.python.ml/source/com/ibm/wala/cast/python/ml/client/Argmax.java Switches argmax default shape from ⊤/unknown to a reduction-based precise shape.
com.ibm.wala.cast.python.ml.test/source/com/ibm/wala/cast/python/ml/test/TestTensorflow2Model.java Updates neural-network tests to run with depth 4 and tightens argmax/argmin shape assertions to precise shapes.
com.ibm.wala.cast.python.ml.test/source/com/ibm/wala/cast/python/ml/test/TestPythonMLCallGraphShape.java Adds depth-parameterized makeEngine helper to let specific tests opt into deeper targeted context sensitivity.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 87.91209% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.67%. Comparing base (6381dec) to head (be43d95).

Files with missing lines Patch % Lines
...t/python/ml/client/PythonTensorAnalysisEngine.java 78.94% 2 Missing and 6 partials ⚠️
...com/ibm/wala/cast/python/ml/client/ReduceMean.java 86.66% 2 Missing ⚠️
...wala/cast/python/ml/test/TestTensorflow2Model.java 96.77% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master     #371   +/-   ##
=========================================
  Coverage     71.66%   71.67%           
- Complexity     2712     2731   +19     
=========================================
  Files           272      272           
  Lines         20292    20347   +55     
  Branches       3273     3283   +10     
=========================================
+ Hits          14542    14583   +41     
- Misses         4459     4470   +11     
- Partials       1291     1294    +3     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

khatchad and others added 2 commits June 9, 2026 18:00
Address PR #371 review feedback.

- Reject a negative `targetedCfaDepth` at construction with an
  `IllegalArgumentException` via a shared `checkTargetedCfaDepth` guard,
  document the non-negative range, and cover it with a test.
- Document that `Argmax`/`Argmin` shape inference does not honor the
  deprecated TF 1.x `dimension` axis alias (it reduces as `axis=None`);
  honoring it precisely is tracked by wala#572.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Routing every user `call`/`__call__`/`predict` method through the targeted
k-CFA selector at the default depth added context-sensitivity suite-wide
without retaining enough caller frames to fix the chained-layer collapse
(which needs a depth above the default), inflating analysis time and call
graphs for every model test. Apply the model-method predicate only when the
caller opts into a depth greater than `DEFAULT_TARGETED_CFA_DEPTH`, so the
default-depth behavior matches the prior context-insensitive analysis while
`testNeuralNetwork*` (depth 4) keep the per-context layer-output fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 9, 2026 23:53

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

khatchad and others added 2 commits June 9, 2026 21:02
Several `findOrCreatePointsToSet` call sites in `PythonTensorAnalysisEngine`
(`getShapeSourceCalls`, `getSetShapeCallsSyntactic`, `getKeysDefinedByCall`)
and `TensorGeneratorFactory.getPointsToSetVariable` materialized points-to-set
variables for keys that can be implicitly represented. On an implicit key,
WALA's `PropagationSystem.findOrCreatePointsToSet` prints the entire call
graph's IR to stderr (a no-op-assertion debug path) and then continues. With
the deeper context sensitivity added by the preceding commits, a single
`testNeuralNetwork` hit this path ~911 times, dumping ~8.46M lines and
dominating CI log I/O (the `verify` step ran ~4.5x longer).

Guard these sites with `PropagationSystem.isImplicit`, matching the existing
guards elsewhere in the analysis; an implicit key has no explicit dataflow
variable to pin or track, so skipping it preserves behavior (full suite
unchanged, and `testNeuralNetwork`'s output drops from ~15.3M lines to ~43K).

Closes wala#573.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This reverts commit fc20a93.

The gate was added on the hypothesis that the model-method predicate caused
the CI slowdown. It did not: the slowdown was WALA dumping the entire call
graph's IR on implicitly-represented PointerKeys, fixed in the preceding
commit (wala#573). The gate is perf-neutral (measured), so drop it and let
the model-method predicate apply unconditionally for simpler code.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 10, 2026 01:03

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

khatchad and others added 2 commits June 10, 2026 10:06
`Argmax` delegates shape inference to `ReduceMean`, which reads its `keepdims`
argument at positional index 2. For `tf.math.argmax`, index 2 is `output_type`,
so a positional call like `tf.argmax(x, 0, tf.int32)` had its `output_type`
misread as `keepdims`: a dtype constant is neither boolean nor numeric, so
`ReduceMean` widened `keepdims` to `{false, true}` and unioned a spurious
`keepdims=true` shape (e.g. `(1, 3)`).

Extract `ReduceMean.getKeepDimsValues` and override it in `Argmax` to force
`{false}` — argmax/argmin have no `keepdims` parameter and always remove the
scanned axis. Add `testArgmaxOutputTypePositional` as a regression guard.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…wardMethod`

Replace the `"call"` and `"__call__"` string literals with the existing
`PythonTypes.CALLABLE_METHOD_NAME_FOR_KERAS_MODELS` and
`PythonTypes.CALLABLE_METHOD_NAME` constants so the heuristic and the
front-end share a single source of truth for the method names.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

khatchad and others added 2 commits June 10, 2026 15:44
…s in `ReduceMean`

`getKeepDimsValues`'s `ConstantKey` branch handled only `Boolean`, `Number`, and `null`, silently dropping any other constant and collapsing to `{false}`, contrary to the method's Javadoc. Add the missing `else` so such a constant widens to `{false, true}` for soundness, matching the non-constant case.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`Argmax.getDefaultShapes` delegates to `ReduceMean`, which resolved the input argument by the keyword name `input_tensor`. Argmax's parameter is named `input`, so a keyword call like `tf.math.argmax(input=x, axis=0)` failed argument resolution and threw `IllegalStateException` during shape inference. Make the input-parameter name and position overridable in `ReduceMean` and override them in `Argmax`, keeping the reduction logic in one place. Add `tf2_test_argmax_input_keyword.py` and `testArgmaxInputKeyword` as a regression guard.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@khatchad khatchad added this pull request to the merge queue Jun 10, 2026
Merged via the queue into master with commit 51d93bb Jun 10, 2026
13 checks passed
@khatchad khatchad deleted the ml-530-layer-output-shape branch June 10, 2026 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants