Implement SCA Reachability runtime detection: report vulnerable classes and callsites via telemetry#11352
Implement SCA Reachability runtime detection: report vulnerable classes and callsites via telemetry#11352jandro996 wants to merge 69 commits into
Conversation
Adds a new SCA Reachability subsystem that reports which vulnerable library classes were actually loaded at runtime, reducing false positives from static dependency scanning. Gated on DD_APPSEC_SCA_ENABLED. Key components: - Gradle task downloads GHSA enrichments from sca-reachability-database and generates sca_cves.json bundled in the agent jar at build time - ClassFileTransformer (observation-only) detects when vulnerable classes are loaded, resolves JAR versions via pom.properties, and checks semver ranges using ComparableVersion (Maven semantics) - ScaReachabilityCollector bridges the transformer and telemetry without circular dependencies, following the WafMetricCollector pattern - ScaReachabilityPeriodicAction reports hits on each app-dependencies-loaded heartbeat by adding reachability metadata to existing dependency entries
…n task The Gradle task now writes to src/main/resources/ and runs only when -PrefreshSca is passed or the file is absent, so CI builds never need network access to the private sca-reachability-database repo.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e607887e99
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
On Java 9+, the system classloader (jdk.internal.loader.ClassLoaders$AppClassLoader) no longer extends URLClassLoader, so the URLClassLoader chain walk misses all main classpath entries. Add a fallback that reads java.class.path to cover this case, deduplicating with a HashSet<URL> to avoid scanning the same JAR twice.
…rivate Test verifies: (1) system classloader is not URLClassLoader on Java 9+, and (2) findArtifactVersionInClasspath finds artifacts via java.class.path fallback. Applies to Java 9 and all subsequent JDKs (permanent JDK design change).
When sca_cves.json contains symbols with method != null, the transformer injects a static callback at method entry using ASM. The callback fires the first time the method is called and reports via ScaReachabilityCallback (bootstrap classloader, accessible from any application class). Key changes: - ScaReachabilityCallback in agent-bootstrap: bootstrap-visible callback with runtime dedup (vulnId|artifact|methodName) and handler registration - ScaReachabilityTransformer: injectMethodCallbacks() uses ByteBuddy ASM to inject INVOKESTATIC at first line number of each target method; processClass() routes class-level vs method-level symbols separately - ScaReachabilityHit: adds symbolName + line fields; existing constructor defaults to <clinit>/1 for class-level hits (backward compatible) - ScaReachabilityPeriodicAction: buildMetadataValue() now uses hit.symbolName() and hit.line() instead of hardcoded values - 6 tests: ASM injection, callback fires on method call only, dedup, multiple methods, safe method not reported, class-level unaffected
…rsion-unresolved Two cases required deferred retransformation: 1. Classes already loaded at startup (before transformer registered): the bytecode callback cannot be injected without retransformClasses() 2. Classes where DependencyResolver returned empty deps at load time (version not yet resolvable): empty results are now not cached to allow retries ScaReachabilityTransformer now stores Instrumentation and exposes performPendingRetransforms() called on each telemetry heartbeat via a Runnable callback in ScaReachabilityCollector.periodicWorkCallback. Classes are queued via: - pendingRetransform (Class<?> queue) from checkAlreadyLoadedClasses - pendingRetransformNames (String set) from processClass on empty deps
retransformClasses() always starts from the ORIGINAL class file bytes, not from the previously-transformed bytes. A dedup check in injectCallbacks() that blocked re-injection on the second pass caused the callback to be removed (the class was returned to its original, un-instrumented state). The authoritative dedup for method-level hits is ScaReachabilityCallback.reported (bootstrap-side), which persists across retransformations regardless of how many times transform() is called on the same class. Also update .claude-invariants.md: retransformClasses is now used (for method-level only), the cache constraint clarified, and the dedup invariant documents the two-level approach (transformer for class-level, bootstrap for method-level).
…avadoc, add retransform tests - performPendingRetransforms(): early return when instrumentation is null (unit test safety) - ScaReachabilityCollector: encapsulate periodicWorkCallback as private with getter/setter - ScaReachabilityTransformer class Javadoc: update dedup description from (vulnId,artifact) pair to (vulnId,artifact,symbolName) tuple; document two-level dedup strategy - Add 3 tests for performPendingRetransforms(): no-op with null inst, retransformClasses called for pending queue, no-op when both queues empty
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 82ea8065d9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…esolution P1: Replace StandardCharsets.UTF_8 with "UTF-8" string in ScaCveDatabase.load(). java.nio.* is forbidden during premain (bootstrap_design_guidelines.md) because it can trigger premature provider initialization before the app configures the runtime. P2: Add classpath fallback in resolveVersionForArtifact() for entries where the vulnerable artifact is an aggregator/starter POM whose watched classes live in a transitive dependency JAR (e.g., spring-boot-starter-web watches @controller but @controller is defined in spring-context.jar, not the starter). The new helper first checks the class's own JAR, then falls back to findArtifactVersionInClasspath with a hit cache (classpathArtifactCache). processPathA uses the same helper for consistency.
…IfPresent helper - Add CLASS_LEVEL_SYMBOL = "<clinit>" constant to avoid magic string repetition (appeared 3 times in the same class; a typo would silently produce wrong symbol names) - Extract reportClassLevelHitIfPresent(entry, version, internalClassName) helper to unify identical class-level symbol matching loops in processPathA, processPathB, and processClass — all three now delegate to the single helper
Move CLASS_LEVEL_SYMBOL = "<clinit>" to ScaReachabilityHit (internal-api) as a public constant so both the transformer (appsec) and the telemetry payload builder share the canonical definition without cross-module string duplication. The convenience constructor also uses the constant now. ScaReachabilityTransformer delegates to ScaReachabilityHit.CLASS_LEVEL_SYMBOL. Fix misleading comment in processClass: "We enqueue via classBeingRedefined is null here" → explains that classBeingRedefined is null on first class load, preventing direct Class<?> queuing, so scheduleRetransformByName is used instead.
…lback - ScaCveDatabase: move "java.nio.* forbidden in premain" comment from the imports block to inline at the InputStreamReader construction site (comments in imports are unusual and smola flags verbose placement) - ScaReachabilityTransformer.resolveVersionForArtifact: make package-private for testing; add 4 tests covering the two-step fallback: (1) version from classJarDeps directly (2) classpath fallback when classJarDeps is empty (transitive JAR case) (3) classpathArtifactCache hit on second call (4) null for absent artifact
- Remove empty visitCode() in MethodEntryInjector: the method only called super.visitCode() and its comment was misleading — the actual no-debug-info fallback injection is handled by ensureInjected() in the visitInsn/visitVarInsn/ visitMethodInsn/visitFieldInsn overrides, not here - Remove private CLASS_LEVEL_SYMBOL alias in ScaReachabilityTransformer: the constant is used in exactly one place (reportClassLevelHitIfPresent) and ScaReachabilityHit.CLASS_LEVEL_SYMBOL is self-documenting at that site; the alias added a private field with no benefit after the constant was moved to ScaReachabilityHit in a previous commit
|
@codex review |
|
Codex Review: Didn't find any major issues. 👍 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Per the RFC and Python implementation (dd-trace-py#17156), the telemetry payload path/symbol/line for method-level hits must report the APPLICATION FRAME that called the vulnerable method (the callsite), not the vulnerable method itself. ScaReachabilityCallback.onMethodHit() now walks Thread.getStackTrace() to find the first non-agent, non-JDK frame after the vulnerable class: ScaReachabilityCallback.onMethodHit (skip - us) com.foo.VulnerableClass.method (skip - vulnerable class) com.myapp.UserService.processRequest (CALLSITE - report this) The dotClassName/methodName params are still baked into the bytecode and used only for deduplication (vulnId|artifact|methodName key). The handler receives the callsite's class/method/line for telemetry. Fallback: if no application frame is found (e.g. called from JDK internals), reports the vulnerable symbol itself so the backend knows it was reached. Class-level hits (<clinit>) are unchanged — no callsite at class load time.
ScaReachabilityCallback (bootstrap) must stay minimal — complex logic does not belong there. Move findCallsite() to ScaReachabilitySystem which has access to internal-api utilities. The handler runs synchronously so the full call stack is still present: ScaReachabilitySystem handler ScaReachabilityCallback.onMethodHit <vulnerable method> <application callsite> ← reported Uses the same class-prefix predicate as AbstractStackWalker. isNotDatadogTraceStackElement (package-private, so replicated inline) to skip agent/JDK frames, consistent with the IAST trie-based filtering infrastructure used elsewhere in the codebase.
…ltering Make isNotDatadogTraceStackElement public in AbstractStackWalker so SCA Reachability can use the existing predicate directly rather than duplicating the 3 class-prefix conditions inline.
… path ScaReachabilitySystemCallsiteTest covers: - findCallsite returns null when vulnerable class is not on the stack (triggers fallback: handler reports the vulnerable symbol itself) - findCallsite skips the vulnerable class frame and returns the first non-agent frame above it (using java.lang.Thread as a non-agent class guaranteed to be at the top of getStackTrace()) Note on the method-level integration test: TargetClass is in com.datadog.appsec.sca.* (agent namespace) so AbstractStackWalker filters it as agent code and findCallsite() returns null. The test now documents this fallback behaviour explicitly. In production the vulnerable class is always a 3rd-party library (e.g. com.fasterxml.jackson.*) and the happy path fires correctly — verified by ScaReachabilitySystemCallsiteTest.
The stream().anyMatch() for detecting method-level symbols was computed for every entry unconditionally. It is only needed when version == null (deps not yet resolvable). Moving the check inside the version==null branch eliminates the stream allocation on the common path where the version resolves successfully.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6008ac9ca0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…cators JDK classes (e.g. java.sql.PreparedStatement, protectionDomain==null) are loaded by ANY app that uses JDBC, regardless of which driver is present. Using their presence to infer that a specific library (e.g. PostgreSQL) is "reachable" produces classpath-presence false positives, not runtime reachability signals. Entries that list JDK symbols (e.g. the PostgreSQL advisory) also include library-specific classes (e.g. org.postgresql.ds.PGSimpleDataSource) that Path A correctly detects when those classes are actually loaded. In checkAlreadyLoadedClasses(), classes with no code source (JDK/bootstrap) are now skipped silently. The invariants and KB are updated accordingly.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 24c53f05cc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The smoke test app uses application.properties (not application.yml), so Spring Boot never triggers snakeyaml loading. ScaReachabilityInit instantiates Yaml in @PostConstruct to ensure class-level CVE detection at startup. Also removes the unused ScaReachabilityCollector draft and reverts TelemetrySystem to register both DependencyPeriodicAction and ScaReachabilityPeriodicAction (restoring dual-action mode). Smoke test now passes in ~4s with the default 30s timeout.
This comment has been minimized.
This comment has been minimized.
…ame, fix recordHit TOCTOU - Remove unused markPending() - the retry-via-re-mark flow was superseded by the emit-immediately approach in Step 3 of ScaReachabilityPeriodicAction - Inline private single-statement scheduleRetransformByName() at its only call site - Fix non-atomic containsKey+get in recordHit: use computeIfAbsent directly, which eliminates the race and avoids the unintended pendingReport=true side effect that unconditional registerCve() would have introduced - Update stale Javadoc in ScaReachabilityPeriodicAction referencing the old re-mark-pending retry strategy
…d (Option C)
When DD_APPSEC_SCA_ENABLED=true, ScaReachabilityPeriodicAction handles all
app-dependencies-loaded emission by merging DependencyService drains with CVE
registry state. DependencyPeriodicAction is skipped to avoid:
- Duplicate entries per dep per heartbeat
- DependencyService queue being drained before ScaReachabilityPeriodicAction
can see new JARs (which prevented knownDeps from being populated and caused
CVE entries to always emit without source/hash)
The previous attempt (commit b0421ab) failed system-tests due to a
markPending retry in Step 3 that could delay CVE emission by one heartbeat
cycle. That was fixed in 6925358 (immediate emit). The current code is safe.
…pection, StandardizedLogging
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7b449cff45
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
… symbols for same class An entry with symbols [Yaml.load, Yaml.loadAll] indexed once per symbol causing the same ScaEntry to appear twice for org/yaml/snakeyaml/Yaml. processClass then iterated the duplicate and injected bytecode callbacks twice per method, resulting in redundant ScaReachabilityCallback.onMethodHit() calls on every invocation. Fix: track seen class names per entry with a HashSet; add regression test.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 05e253778d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…classloader scenarios ScaReachabilityCallback.reported and ScaReachabilityTransformer.reportedHits keyed hits without the artifact version. In a multi-classloader setup (multiple WARs, OSGi) where two versions of the same library coexist, the first version's hit suppressed all subsequent versions for the same CVE and symbol. Both keys now include version: vulnId|artifact|version|class|method.
When the call chain is client -> lib_wrapper -> vulnerable_method, findCallsite was reporting lib_wrapper instead of the client frame. Add sca_stack_exclusion.trie (generated as ScaStackExclusionTrie) and apply it after the vulnerable-class boundary to skip known library packages, matching the approach used by IAST SinkModuleBase. Refactor findCallsite to a testable overload that accepts an explicit StackTraceElement[], and rewrite tests with synthetic stacks.
What Does This Do
Implements the SCA Reachability subsystem for the Java APM agent. When
DD_APPSEC_SCA_ENABLED=true, the agent detects at runtime which classes and methods from vulnerable library versions are actually loaded and invoked, and reports this viaapp-dependencies-loadedtelemetry using the RFC stateful heartbeat model.Build - Symbol database
generateScaCvesJsonindd-java-agent/appsec/build.gradledownloads GHSA enrichments from the privatesca-reachability-databaseand bundlessca_cves.jsonin the agent JAR. The committed copy is used in CI (no network access required at build time). Maintainers regenerate with./gradlew generateScaCvesJson -PrefreshSca.GhsaEnrichmentParser(buildSrc Kotlin): converts the GHSA enrichment format to the internal format - one record per Maven artifact, class symbols in JVM internal format (slashes). Filters to JVM language and maven ecosystem only.Detection - ClassFileTransformer
ScaReachabilityTransformer: observation-onlyClassFileTransformerinstalled byScaReachabilitySystem.start(). On each class load, looks up the class name inScaCveDatabase's index, resolves the artifact version from the JAR'spom.properties, and checks the version range.ScaReachabilityDependencyRegistrywithsymbol=<clinit>.ScaReachabilityCallback.onMethodHit()fires from bootstrap, walks the stack to find the application callsite, and records it viaScaReachabilityDependencyRegistry.ScaReachabilitySystem: entry point called fromAgent.javaby reflection (same pattern asAppSecSystem). Wires the callback handler and registers the transformer.ScaReachabilityCallback(bootstrap): dedup set + dispatch bridge between application classloader (injected bytecode) and agent classloader (handler). Must live in bootstrap to be visible from any classloader.pom.propertiesfirst, then classpath scan as fallback for aggregator artifacts (e.g. Spring Boot starters).VersionRangeParser: evaluates GHSA version range conditions (<,<=,>=,>,=, comma-separated AND).Telemetry - RFC stateful heartbeat
ScaReachabilityDependencyRegistry(internal-api): stateful registry for CVE state perartifact@version.registerCve()creates an entry withreached:[](class loaded, monitoring active);recordHit()stores the first callsite viaAtomicReference.compareAndSet(first-hit-wins, thread-safe).ScaReachabilityPeriodicAction(telemetry): on each heartbeat, drains bothDependencyService(new JARs) and the registry (CVE state changes), merges them into one entry per dependency. ReplacesDependencyPeriodicActionwhen SCA is enabled to avoid duplicateapp-dependencies-loadedentries.Dependency: new nullablereachabilityMetadatafield (list of stringified JSON per RFC).TelemetryRequestBody.writeDependency(): writes themetadataarray whenreachabilityMetadata != null.TelemetrySystem: registersScaReachabilityPeriodicActioninstead ofDependencyPeriodicActionwhenDD_APPSEC_SCA_ENABLED=true.Motivation
Implements APPSEC-62260 - SCA Reachability runtime detection per the RFC.
Additional Notes
sca_cves.jsonis committed to the repo becausesca-reachability-databaseis a private repo not accessible from CI without a token. Only the maintainer runs the refresh task.sca_cves.json(snakeyamlload/loadAll, xstreamfromXML, etc.) are manually curated for testing. The database format does not yet define method-level entries. When the database team adds this, the Gradle task will be updated and the manual entries removed.AbstractStackWalker.isNotDatadogTraceStackElementvisibility changed from package-private topublicto allow callsite filtering from theappsecmodule.ScaReachabilityTransformerpattern:findCallsite()usesThread.currentThread().getStackTrace()synchronously while the call stack still contains the vulnerable frame. This avoids any async coordination and matches the Python tracer approach.Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueUse
solvesinstead, and assign the PR milestone to the issueJira ticket: APPSEC-62260
Note: Once your PR is ready to merge, add it to the merge queue by commenting
/merge./merge -ccancels the queue request./merge -f --reason "reason"skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.