Skip to content

feat(analyzer): Enhance type-aware pattern matching and generic type handling#99

Open
misonijnik wants to merge 27 commits intomainfrom
misonijnik/match-generic-types
Open

feat(analyzer): Enhance type-aware pattern matching and generic type handling#99
misonijnik wants to merge 27 commits intomainfrom
misonijnik/match-generic-types

Conversation

@misonijnik
Copy link
Copy Markdown
Member

No description provided.

misonijnik and others added 27 commits April 20, 2026 15:52
Replace source pre-resolution approach with simpler plumbing fix:
preserve generic type args through the existing pipeline and match
against JIRTypedMethod at runtime. Covers method-level signatures
and call-site receiver generics via LocalVariableTypeTable.
- Add private matchType(JIRType) extension on SerializedTypeNameMatcher that
  performs structural generic comparison (ClassPattern with typeArgs vs JIRClassType)
  and array matching (Array vs JIRArrayType), falling back to erased name matching
  for backward compatibility
- Update resolveIsType() to extract typeArgs from ClassPattern normalizedTypeIs
  and pass them to TypeMatchesPattern for deferred evaluation at instruction level
- Add imports: JIRArrayType, JIRClassType, JIRType from org.opentaint.ir.api.jvm
…ceiver matching

- Fix resolveIsType() short-circuit: when typeArgs are present, skip
  mkTrue() early return and defer to TypeMatchesPattern deferred evaluation
- Add typedMethod: JIRTypedMethod? parameter to JIRBasicAtomEvaluator
  and JIRMarkAwareConditionRewriter (optional, backward-compatible)
- Extend typeMatchesPattern() to check generic type args via
  JIRTypedMethod.typeOf(LocalVariableNode) resolved from LocalVariableTypeTable
- Add matchType() and matchErasedName() helpers for SerializedTypeNameMatcher
  recursive matching against JIRType (class, array, wildcards)
- Pass returnType from SemgrepPatternAction.MethodSignature to automata
  MethodSignature in ActionListToAutomata.constructSignatureFormula()
- Unify returnType in MethodFormulaSimplifier.unify()
- Preserve returnType in notEvaluatedSignature()
- Move return type processing before early return in evaluateFormulaSignature()
- Fix normalizeAnyName() in ClassNameUtils to preserve typeArgs
- Store cp in TaintConfiguration for typed method resolution
- Add resolveTypedMethod() for eager generic type checking on Argument/Result
- Use erased class name (jIRClass.name) instead of typeName in matchType()
  to avoid matching against generic-param-decorated names
- Fix TypeAwarePatternTest to pass EXPECT_STATE_VAR for generic test
Adds six pattern-inside return-type / param-type matching sample rules
that exercise generic, array, nested-generic, wildcard and raw-type
forms introduced on this branch. Tests also pin current engine behavior
where the method-decl return-type specificity is effectively ignored
(A1/A3/A6 show ResponseEntity<X> rules match other parameterized and
raw forms of ResponseEntity).
The three samples previously pinned over-matching behavior as Positive
with explanatory comments. That made TypeAwarePatternTest report
"10/10 passing" while silently hiding the gap that this branch's
type-matching feature does not discriminate by concrete type argument
at method-decl return position.

Rename the pins:

- RuleWithGenericByteArrayReturnType:
PositiveResponseEntityStringPinsOverMatch
  → NegativeResponseEntityString (rule asks for ResponseEntity<byte[]>,
  sample method returns ResponseEntity<String>)
- RuleWithNestedGenericReturnType: PositiveFlatGenericPinsOverMatch
  → NegativeFlatGeneric (rule asks for ResponseEntity<List<String>>,
  sample method returns ResponseEntity<String>)
- RuleWithRawResponseEntity: PositiveParameterizedString and
  PositiveParameterizedByteArray → NegativeParameterizedString and
  NegativeParameterizedByteArray (rule uses raw ResponseEntity, sample
  methods return parameterized forms)

With honest labels, TypeAwarePatternTest reports 7 passing / 3
failing, the three failures surfacing the actual engine behavior on
this branch. The @disabled B11 in EngineGapsTest remains as a second
witness to the same gap on origin/main.
Before this change TypeAwarePatternTest's A1, A3, and A6 failed by
over-matching:

- A1 (rule `ResponseEntity<byte[]>`) matched `ResponseEntity<String>`
  methods — concrete type-arg specificity on the return type was
  ignored.
- A3 (rule `ResponseEntity<List<String>>`) matched
  `ResponseEntity<String>` methods — nested generic specificity was
  ignored.
- A6 (rule raw `ResponseEntity`) matched parameterized
  `ResponseEntity<String>` and `ResponseEntity<byte[]>` methods — raw
  vs parameterized collapsed to erased name.

The fix has four pieces:

1. SerializedSignatureMatcher.matchFunctionSignature now resolves the
   method via cp.typeOf(...).declaredMethods (JIRTypedMethod) so the
   structural `matchType(JIRType)` sees real generic type arguments.
   When typed resolution fails, it falls back to the pre-existing
   erased-name match on TypeName.typeName.

2. matchType's "no typeArgs on matcher" branch now requires the class
   type to be raw-like (no concrete type arguments) — otherwise a
   pattern like `ResponseEntity` would silently match
   `ResponseEntity<String>`. Raw-like means either no type arguments
   at all or only declared type variables / unbound wildcards (so a
   raw method whose resolved type surfaces its class's own type
   variable — `ResponseEntity<T>` — still matches the raw rule).

3. typeMatcher preserves the arity of a class-pattern's typeArgs list
   even when an inner metavariable / AnyType resolves to a null
   constraint. Previously `mapNotNull` silently dropped such entries,
   collapsing `ResponseEntity<$T>` into the raw form. Empty slots now
   become an `anyClassPattern()` matcher.

4. The wildcard `?` in pattern type arguments is now a first-class
   TypeName.WildcardTypeName instead of a dropped null, so
   `ResponseEntity<?>` keeps its arity through parsing and rewriting
   and gets converted to TypeNamePattern.AnyType by the pattern
   converter.

TypeAwarePatternTest now reports 10/10 passing (was 7/10 after the
honest-label conversion). Full :opentaint-java-querylang:test suite
stays green (no regressions elsewhere).
Add four new cases to TypeAwarePatternTest to surface remaining gaps in
handling generic function definitions:

- A8 Map<$K, String> mixed metavar + concrete (PASSES).
- A10 List<List<String>> deep nesting (PASSES for both flat and inner
  mismatch Negatives).
- A12 List<String> concrete parameter-position discrimination (PASSES).
- A13 ResponseEntity<java.lang.String> fully-qualified type argument
  (PASSES; FQN resolves to the same class as simple name).
Add seven new cases to TypeAwarePatternTest to further saturate coverage
of generic function-definition matching:

- A15 List<String>[] array of parameterized type (PASSES; array +
  inner-type arg discrimination works).
- A17 String vs Integer concrete return discrimination (PASSES).
- A19 List<Map<String, Integer>> nested generic in parameter position
  (PASSES; structural recursion works on parameter side as well as
  return side).
- A20 Class<$T> reflection-style parameter with metavar type arg
  (PASSES).
- A21 Collection<String> vs List<String> (PASSES after flipping the
  List<String> sample from Positive to Negative — observed: the engine
  uses exact-type matching at method-decl return position, not subtype
  widening; List<String> does NOT match a Collection<String> pattern).
- A22 Map<String, List<Integer>> nested mixed containers (PASSES; all
  four structural-mismatch negatives stay silent).
- A23 String[][] two-dim array return — dimension + element-type
  discrimination (PASSES).

Additional engine gap exposed:
- Subtype widening (A21) — the engine does exact-type matching; a
  Collection<T> pattern does NOT match a List<T>-returning method, in
  contrast to semgrep's widening semantics. Documented as a Negative.
Two parallel implementations of SerializedTypeNameMatcher.matchType
had grown up across modules:

- TaintConfiguration (opentaint-jvm-sast-dataflow): strict matcher
  with isRawLike() check for raw-vs-parameterized discrimination,
  uses PatternManager-cached name matching.
- JIRBasicAtomEvaluator (opentaint-jvm-dataflow): nested-only matcher
  with stateless Regex name matching.

Moved the recursion structure to configuration-rules-jvm (next to
SerializedTypeNameMatcher itself) as a single matchType(JIRType,
erasedMatch) extension. Each caller plugs in its own erased-name
matcher, so the PatternManager cache is preserved in the hot path
while the evaluator keeps its stateless Regex path.

Other cleanups in the same pass:

- Collapsed 4 repeated `if (typedType != null) matchType else match`
  branches in matchFunctionSignature into a matchTypedOrErased helper.
- Extracted resolveTypedPositionType to flatten the nested
  if/when/continue block in resolveIsType.
- Added kdoc on JIRMarkAwareConditionRewriter.typedMethod explaining
  that null silently disables generic-type-argument matching.
- Narrowed and documented the bare catch in resolveGenericType.
- Dropped WHAT-style comments left by the original type-matching
  implementation.

Net: -119 / +51 lines across the three consumer files, +61 lines in
one shared file. Behavior unchanged — TypeAwarePatternTest results
are identical pre- and post-refactor.
A pattern like `ResponseEntity<?>` previously matched concrete
parameterizations such as `ResponseEntity<String>` because wildcards
were lowered to an unconstrained "any class" matcher at the
type-argument slot. Introduce a dedicated wildcard representation —
`TypeNamePattern.WildcardType` in the query language and
`SerializedTypeNameMatcher.Wildcard` in the serialized matcher — so a
wildcard slot in the pattern matches only a `JIRUnboundWildcard` at the
same slot in code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pass-through rules like `java.util.List#get` returning `java.lang.Object`
were no longer matching since typed method resolution via
`cp.typeOf(method.enclosingClass)` surfaces declared return/parameter
types as `JIRTypeVariable` (e.g. `E`) rather than the erased class.
`erasedName()` fell through to `typeName`, producing the type-variable
symbol `"E"` instead of `"java.lang.Object"`, so every string-based
matcher missed. Map type variables and unbound wildcards to their
declared erasure via `jIRClass.name`, and extend the same lookup to
array element types.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@misonijnik misonijnik force-pushed the misonijnik/match-generic-types branch from e0d76b7 to 62580aa Compare April 21, 2026 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant