feat(analyzer): Enhance type-aware pattern matching and generic type handling#99
Open
misonijnik wants to merge 27 commits intomainfrom
Open
feat(analyzer): Enhance type-aware pattern matching and generic type handling#99misonijnik wants to merge 27 commits intomainfrom
misonijnik wants to merge 27 commits intomainfrom
Conversation
Replace source pre-resolution approach with simpler plumbing fix: preserve generic type args through the existing pipeline and match against JIRTypedMethod at runtime. Covers method-level signatures and call-site receiver generics via LocalVariableTypeTable.
- Add private matchType(JIRType) extension on SerializedTypeNameMatcher that performs structural generic comparison (ClassPattern with typeArgs vs JIRClassType) and array matching (Array vs JIRArrayType), falling back to erased name matching for backward compatibility - Update resolveIsType() to extract typeArgs from ClassPattern normalizedTypeIs and pass them to TypeMatchesPattern for deferred evaluation at instruction level - Add imports: JIRArrayType, JIRClassType, JIRType from org.opentaint.ir.api.jvm
…ceiver matching - Fix resolveIsType() short-circuit: when typeArgs are present, skip mkTrue() early return and defer to TypeMatchesPattern deferred evaluation - Add typedMethod: JIRTypedMethod? parameter to JIRBasicAtomEvaluator and JIRMarkAwareConditionRewriter (optional, backward-compatible) - Extend typeMatchesPattern() to check generic type args via JIRTypedMethod.typeOf(LocalVariableNode) resolved from LocalVariableTypeTable - Add matchType() and matchErasedName() helpers for SerializedTypeNameMatcher recursive matching against JIRType (class, array, wildcards)
- Pass returnType from SemgrepPatternAction.MethodSignature to automata MethodSignature in ActionListToAutomata.constructSignatureFormula() - Unify returnType in MethodFormulaSimplifier.unify() - Preserve returnType in notEvaluatedSignature() - Move return type processing before early return in evaluateFormulaSignature() - Fix normalizeAnyName() in ClassNameUtils to preserve typeArgs - Store cp in TaintConfiguration for typed method resolution - Add resolveTypedMethod() for eager generic type checking on Argument/Result - Use erased class name (jIRClass.name) instead of typeName in matchType() to avoid matching against generic-param-decorated names - Fix TypeAwarePatternTest to pass EXPECT_STATE_VAR for generic test
Adds six pattern-inside return-type / param-type matching sample rules that exercise generic, array, nested-generic, wildcard and raw-type forms introduced on this branch. Tests also pin current engine behavior where the method-decl return-type specificity is effectively ignored (A1/A3/A6 show ResponseEntity<X> rules match other parameterized and raw forms of ResponseEntity).
The three samples previously pinned over-matching behavior as Positive with explanatory comments. That made TypeAwarePatternTest report "10/10 passing" while silently hiding the gap that this branch's type-matching feature does not discriminate by concrete type argument at method-decl return position. Rename the pins: - RuleWithGenericByteArrayReturnType: PositiveResponseEntityStringPinsOverMatch → NegativeResponseEntityString (rule asks for ResponseEntity<byte[]>, sample method returns ResponseEntity<String>) - RuleWithNestedGenericReturnType: PositiveFlatGenericPinsOverMatch → NegativeFlatGeneric (rule asks for ResponseEntity<List<String>>, sample method returns ResponseEntity<String>) - RuleWithRawResponseEntity: PositiveParameterizedString and PositiveParameterizedByteArray → NegativeParameterizedString and NegativeParameterizedByteArray (rule uses raw ResponseEntity, sample methods return parameterized forms) With honest labels, TypeAwarePatternTest reports 7 passing / 3 failing, the three failures surfacing the actual engine behavior on this branch. The @disabled B11 in EngineGapsTest remains as a second witness to the same gap on origin/main.
Before this change TypeAwarePatternTest's A1, A3, and A6 failed by over-matching: - A1 (rule `ResponseEntity<byte[]>`) matched `ResponseEntity<String>` methods — concrete type-arg specificity on the return type was ignored. - A3 (rule `ResponseEntity<List<String>>`) matched `ResponseEntity<String>` methods — nested generic specificity was ignored. - A6 (rule raw `ResponseEntity`) matched parameterized `ResponseEntity<String>` and `ResponseEntity<byte[]>` methods — raw vs parameterized collapsed to erased name. The fix has four pieces: 1. SerializedSignatureMatcher.matchFunctionSignature now resolves the method via cp.typeOf(...).declaredMethods (JIRTypedMethod) so the structural `matchType(JIRType)` sees real generic type arguments. When typed resolution fails, it falls back to the pre-existing erased-name match on TypeName.typeName. 2. matchType's "no typeArgs on matcher" branch now requires the class type to be raw-like (no concrete type arguments) — otherwise a pattern like `ResponseEntity` would silently match `ResponseEntity<String>`. Raw-like means either no type arguments at all or only declared type variables / unbound wildcards (so a raw method whose resolved type surfaces its class's own type variable — `ResponseEntity<T>` — still matches the raw rule). 3. typeMatcher preserves the arity of a class-pattern's typeArgs list even when an inner metavariable / AnyType resolves to a null constraint. Previously `mapNotNull` silently dropped such entries, collapsing `ResponseEntity<$T>` into the raw form. Empty slots now become an `anyClassPattern()` matcher. 4. The wildcard `?` in pattern type arguments is now a first-class TypeName.WildcardTypeName instead of a dropped null, so `ResponseEntity<?>` keeps its arity through parsing and rewriting and gets converted to TypeNamePattern.AnyType by the pattern converter. TypeAwarePatternTest now reports 10/10 passing (was 7/10 after the honest-label conversion). Full :opentaint-java-querylang:test suite stays green (no regressions elsewhere).
Add four new cases to TypeAwarePatternTest to surface remaining gaps in handling generic function definitions: - A8 Map<$K, String> mixed metavar + concrete (PASSES). - A10 List<List<String>> deep nesting (PASSES for both flat and inner mismatch Negatives). - A12 List<String> concrete parameter-position discrimination (PASSES). - A13 ResponseEntity<java.lang.String> fully-qualified type argument (PASSES; FQN resolves to the same class as simple name).
Add seven new cases to TypeAwarePatternTest to further saturate coverage of generic function-definition matching: - A15 List<String>[] array of parameterized type (PASSES; array + inner-type arg discrimination works). - A17 String vs Integer concrete return discrimination (PASSES). - A19 List<Map<String, Integer>> nested generic in parameter position (PASSES; structural recursion works on parameter side as well as return side). - A20 Class<$T> reflection-style parameter with metavar type arg (PASSES). - A21 Collection<String> vs List<String> (PASSES after flipping the List<String> sample from Positive to Negative — observed: the engine uses exact-type matching at method-decl return position, not subtype widening; List<String> does NOT match a Collection<String> pattern). - A22 Map<String, List<Integer>> nested mixed containers (PASSES; all four structural-mismatch negatives stay silent). - A23 String[][] two-dim array return — dimension + element-type discrimination (PASSES). Additional engine gap exposed: - Subtype widening (A21) — the engine does exact-type matching; a Collection<T> pattern does NOT match a List<T>-returning method, in contrast to semgrep's widening semantics. Documented as a Negative.
Two parallel implementations of SerializedTypeNameMatcher.matchType had grown up across modules: - TaintConfiguration (opentaint-jvm-sast-dataflow): strict matcher with isRawLike() check for raw-vs-parameterized discrimination, uses PatternManager-cached name matching. - JIRBasicAtomEvaluator (opentaint-jvm-dataflow): nested-only matcher with stateless Regex name matching. Moved the recursion structure to configuration-rules-jvm (next to SerializedTypeNameMatcher itself) as a single matchType(JIRType, erasedMatch) extension. Each caller plugs in its own erased-name matcher, so the PatternManager cache is preserved in the hot path while the evaluator keeps its stateless Regex path. Other cleanups in the same pass: - Collapsed 4 repeated `if (typedType != null) matchType else match` branches in matchFunctionSignature into a matchTypedOrErased helper. - Extracted resolveTypedPositionType to flatten the nested if/when/continue block in resolveIsType. - Added kdoc on JIRMarkAwareConditionRewriter.typedMethod explaining that null silently disables generic-type-argument matching. - Narrowed and documented the bare catch in resolveGenericType. - Dropped WHAT-style comments left by the original type-matching implementation. Net: -119 / +51 lines across the three consumer files, +61 lines in one shared file. Behavior unchanged — TypeAwarePatternTest results are identical pre- and post-refactor.
A pattern like `ResponseEntity<?>` previously matched concrete parameterizations such as `ResponseEntity<String>` because wildcards were lowered to an unconstrained "any class" matcher at the type-argument slot. Introduce a dedicated wildcard representation — `TypeNamePattern.WildcardType` in the query language and `SerializedTypeNameMatcher.Wildcard` in the serialized matcher — so a wildcard slot in the pattern matches only a `JIRUnboundWildcard` at the same slot in code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pass-through rules like `java.util.List#get` returning `java.lang.Object` were no longer matching since typed method resolution via `cp.typeOf(method.enclosingClass)` surfaces declared return/parameter types as `JIRTypeVariable` (e.g. `E`) rather than the erased class. `erasedName()` fell through to `typeName`, producing the type-variable symbol `"E"` instead of `"java.lang.Object"`, so every string-based matcher missed. Map type variables and unbound wildcards to their declared erasure via `jIRClass.name`, and extend the same lookup to array element types. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
e0d76b7 to
62580aa
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.