Skip to content

Implement SCA Reachability runtime detection: report vulnerable classes and callsites via telemetry#11352

Open
jandro996 wants to merge 69 commits into
masterfrom
alejandro.gonzalez/sca-reachability
Open

Implement SCA Reachability runtime detection: report vulnerable classes and callsites via telemetry#11352
jandro996 wants to merge 69 commits into
masterfrom
alejandro.gonzalez/sca-reachability

Conversation

@jandro996
Copy link
Copy Markdown
Member

@jandro996 jandro996 commented May 12, 2026

What Does This Do

Implements the SCA Reachability subsystem for the Java APM agent. When DD_APPSEC_SCA_ENABLED=true, the agent detects at runtime which classes and methods from vulnerable library versions are actually loaded and invoked, and reports this via app-dependencies-loaded telemetry using the RFC stateful heartbeat model.

Build - Symbol database

  • New Gradle task generateScaCvesJson in dd-java-agent/appsec/build.gradle downloads GHSA enrichments from the private sca-reachability-database and bundles sca_cves.json in the agent JAR. The committed copy is used in CI (no network access required at build time). Maintainers regenerate with ./gradlew generateScaCvesJson -PrefreshSca.
  • GhsaEnrichmentParser (buildSrc Kotlin): converts the GHSA enrichment format to the internal format - one record per Maven artifact, class symbols in JVM internal format (slashes). Filters to JVM language and maven ecosystem only.

Detection - ClassFileTransformer

  • ScaReachabilityTransformer: observation-only ClassFileTransformer installed by ScaReachabilitySystem.start(). On each class load, looks up the class name in ScaCveDatabase's index, resolves the artifact version from the JAR's pom.properties, and checks the version range.
    • Class-level symbols: records a hit immediately via ScaReachabilityDependencyRegistry with symbol=<clinit>.
    • Method-level symbols: injects an ASM bytecode callback at each vulnerable method's entry point. On first invocation, ScaReachabilityCallback.onMethodHit() fires from bootstrap, walks the stack to find the application callsite, and records it via ScaReachabilityDependencyRegistry.
  • ScaReachabilitySystem: entry point called from Agent.java by reflection (same pattern as AppSecSystem). Wires the callback handler and registers the transformer.
  • ScaReachabilityCallback (bootstrap): dedup set + dispatch bridge between application classloader (injected bytecode) and agent classloader (handler). Must live in bootstrap to be visible from any classloader.
  • Version resolution uses a two-step strategy: JAR-local pom.properties first, then classpath scan as fallback for aggregator artifacts (e.g. Spring Boot starters).
  • VersionRangeParser: evaluates GHSA version range conditions (<, <=, >=, >, =, comma-separated AND).

Telemetry - RFC stateful heartbeat

  • ScaReachabilityDependencyRegistry (internal-api): stateful registry for CVE state per artifact@version. registerCve() creates an entry with reached:[] (class loaded, monitoring active); recordHit() stores the first callsite via AtomicReference.compareAndSet (first-hit-wins, thread-safe).
  • ScaReachabilityPeriodicAction (telemetry): on each heartbeat, drains both DependencyService (new JARs) and the registry (CVE state changes), merges them into one entry per dependency. Replaces DependencyPeriodicAction when SCA is enabled to avoid duplicate app-dependencies-loaded entries.
  • Dependency: new nullable reachabilityMetadata field (list of stringified JSON per RFC).
  • TelemetryRequestBody.writeDependency(): writes the metadata array when reachabilityMetadata != null.
  • TelemetrySystem: registers ScaReachabilityPeriodicAction instead of DependencyPeriodicAction when DD_APPSEC_SCA_ENABLED=true.

Motivation

Implements APPSEC-62260 - SCA Reachability runtime detection per the RFC.

Additional Notes

  • sca_cves.json is committed to the repo because sca-reachability-database is a private repo not accessible from CI without a token. Only the maintainer runs the refresh task.
  • Method-level symbols in sca_cves.json (snakeyaml load/loadAll, xstream fromXML, etc.) are manually curated for testing. The database format does not yet define method-level entries. When the database team adds this, the Gradle task will be updated and the manual entries removed.
  • AbstractStackWalker.isNotDatadogTraceStackElement visibility changed from package-private to public to allow callsite filtering from the appsec module.
  • Static ScaReachabilityTransformer pattern: findCallsite() uses Thread.currentThread().getStackTrace() synchronously while the call stack still contains the vulnerable frame. This avoids any async coordination and matches the Python tracer approach.

Contributor Checklist

Jira ticket: APPSEC-62260

Note: Once your PR is ready to merge, add it to the merge queue by commenting /merge. /merge -c cancels the queue request. /merge -f --reason "reason" skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.

jandro996 added 2 commits May 12, 2026 16:08
Adds a new SCA Reachability subsystem that reports which vulnerable
library classes were actually loaded at runtime, reducing false positives
from static dependency scanning. Gated on DD_APPSEC_SCA_ENABLED.

Key components:
- Gradle task downloads GHSA enrichments from sca-reachability-database
  and generates sca_cves.json bundled in the agent jar at build time
- ClassFileTransformer (observation-only) detects when vulnerable classes
  are loaded, resolves JAR versions via pom.properties, and checks semver
  ranges using ComparableVersion (Maven semantics)
- ScaReachabilityCollector bridges the transformer and telemetry without
  circular dependencies, following the WafMetricCollector pattern
- ScaReachabilityPeriodicAction reports hits on each app-dependencies-loaded
  heartbeat by adding reachability metadata to existing dependency entries
…n task

The Gradle task now writes to src/main/resources/ and runs only when
-PrefreshSca is passed or the file is absent, so CI builds never need
network access to the private sca-reachability-database repo.
@jandro996
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e607887e99

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

jandro996 added 6 commits May 13, 2026 10:40
On Java 9+, the system classloader (jdk.internal.loader.ClassLoaders$AppClassLoader)
no longer extends URLClassLoader, so the URLClassLoader chain walk misses all
main classpath entries. Add a fallback that reads java.class.path to cover
this case, deduplicating with a HashSet<URL> to avoid scanning the same JAR twice.
…rivate

Test verifies: (1) system classloader is not URLClassLoader on Java 9+, and
(2) findArtifactVersionInClasspath finds artifacts via java.class.path fallback.
Applies to Java 9 and all subsequent JDKs (permanent JDK design change).
When sca_cves.json contains symbols with method != null, the transformer
injects a static callback at method entry using ASM. The callback fires
the first time the method is called and reports via ScaReachabilityCallback
(bootstrap classloader, accessible from any application class).

Key changes:
- ScaReachabilityCallback in agent-bootstrap: bootstrap-visible callback
  with runtime dedup (vulnId|artifact|methodName) and handler registration
- ScaReachabilityTransformer: injectMethodCallbacks() uses ByteBuddy ASM
  to inject INVOKESTATIC at first line number of each target method;
  processClass() routes class-level vs method-level symbols separately
- ScaReachabilityHit: adds symbolName + line fields; existing constructor
  defaults to <clinit>/1 for class-level hits (backward compatible)
- ScaReachabilityPeriodicAction: buildMetadataValue() now uses
  hit.symbolName() and hit.line() instead of hardcoded values
- 6 tests: ASM injection, callback fires on method call only, dedup,
  multiple methods, safe method not reported, class-level unaffected
…rsion-unresolved

Two cases required deferred retransformation:
1. Classes already loaded at startup (before transformer registered): the bytecode
   callback cannot be injected without retransformClasses()
2. Classes where DependencyResolver returned empty deps at load time (version not yet
   resolvable): empty results are now not cached to allow retries

ScaReachabilityTransformer now stores Instrumentation and exposes
performPendingRetransforms() called on each telemetry heartbeat via a Runnable
callback in ScaReachabilityCollector.periodicWorkCallback. Classes are queued via:
- pendingRetransform (Class<?> queue) from checkAlreadyLoadedClasses
- pendingRetransformNames (String set) from processClass on empty deps
retransformClasses() always starts from the ORIGINAL class file bytes,
not from the previously-transformed bytes. A dedup check in injectCallbacks()
that blocked re-injection on the second pass caused the callback to be
removed (the class was returned to its original, un-instrumented state).

The authoritative dedup for method-level hits is ScaReachabilityCallback.reported
(bootstrap-side), which persists across retransformations regardless of how
many times transform() is called on the same class.

Also update .claude-invariants.md: retransformClasses is now used (for
method-level only), the cache constraint clarified, and the dedup invariant
documents the two-level approach (transformer for class-level, bootstrap for
method-level).
…avadoc, add retransform tests

- performPendingRetransforms(): early return when instrumentation is null (unit test safety)
- ScaReachabilityCollector: encapsulate periodicWorkCallback as private with getter/setter
- ScaReachabilityTransformer class Javadoc: update dedup description from (vulnId,artifact)
  pair to (vulnId,artifact,symbolName) tuple; document two-level dedup strategy
- Add 3 tests for performPendingRetransforms(): no-op with null inst, retransformClasses
  called for pending queue, no-op when both queues empty
@jandro996
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 82ea8065d9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread dd-java-agent/appsec/src/main/java/com/datadog/appsec/sca/ScaCveDatabase.java Outdated
jandro996 added 5 commits May 13, 2026 12:42
…esolution

P1: Replace StandardCharsets.UTF_8 with "UTF-8" string in ScaCveDatabase.load().
java.nio.* is forbidden during premain (bootstrap_design_guidelines.md) because
it can trigger premature provider initialization before the app configures the
runtime.

P2: Add classpath fallback in resolveVersionForArtifact() for entries where the
vulnerable artifact is an aggregator/starter POM whose watched classes live in a
transitive dependency JAR (e.g., spring-boot-starter-web watches @controller but
@controller is defined in spring-context.jar, not the starter). The new helper
first checks the class's own JAR, then falls back to findArtifactVersionInClasspath
with a hit cache (classpathArtifactCache). processPathA uses the same helper for
consistency.
…IfPresent helper

- Add CLASS_LEVEL_SYMBOL = "<clinit>" constant to avoid magic string repetition
  (appeared 3 times in the same class; a typo would silently produce wrong symbol names)
- Extract reportClassLevelHitIfPresent(entry, version, internalClassName) helper
  to unify identical class-level symbol matching loops in processPathA, processPathB,
  and processClass — all three now delegate to the single helper
Move CLASS_LEVEL_SYMBOL = "<clinit>" to ScaReachabilityHit (internal-api) as a public
constant so both the transformer (appsec) and the telemetry payload builder share the
canonical definition without cross-module string duplication. The convenience constructor
also uses the constant now. ScaReachabilityTransformer delegates to ScaReachabilityHit.CLASS_LEVEL_SYMBOL.

Fix misleading comment in processClass: "We enqueue via classBeingRedefined is null here"
→ explains that classBeingRedefined is null on first class load, preventing direct
Class<?> queuing, so scheduleRetransformByName is used instead.
…lback

- ScaCveDatabase: move "java.nio.* forbidden in premain" comment from the
  imports block to inline at the InputStreamReader construction site
  (comments in imports are unusual and smola flags verbose placement)

- ScaReachabilityTransformer.resolveVersionForArtifact: make package-private
  for testing; add 4 tests covering the two-step fallback:
  (1) version from classJarDeps directly
  (2) classpath fallback when classJarDeps is empty (transitive JAR case)
  (3) classpathArtifactCache hit on second call
  (4) null for absent artifact
- Remove empty visitCode() in MethodEntryInjector: the method only called
  super.visitCode() and its comment was misleading — the actual no-debug-info
  fallback injection is handled by ensureInjected() in the visitInsn/visitVarInsn/
  visitMethodInsn/visitFieldInsn overrides, not here

- Remove private CLASS_LEVEL_SYMBOL alias in ScaReachabilityTransformer: the
  constant is used in exactly one place (reportClassLevelHitIfPresent) and
  ScaReachabilityHit.CLASS_LEVEL_SYMBOL is self-documenting at that site;
  the alias added a private field with no benefit after the constant was moved
  to ScaReachabilityHit in a previous commit
@jandro996
Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

jandro996 added 8 commits May 13, 2026 14:57
Per the RFC and Python implementation (dd-trace-py#17156), the telemetry
payload path/symbol/line for method-level hits must report the APPLICATION
FRAME that called the vulnerable method (the callsite), not the vulnerable
method itself.

ScaReachabilityCallback.onMethodHit() now walks Thread.getStackTrace() to
find the first non-agent, non-JDK frame after the vulnerable class:
  ScaReachabilityCallback.onMethodHit  (skip - us)
  com.foo.VulnerableClass.method       (skip - vulnerable class)
  com.myapp.UserService.processRequest (CALLSITE - report this)

The dotClassName/methodName params are still baked into the bytecode and
used only for deduplication (vulnId|artifact|methodName key). The handler
receives the callsite's class/method/line for telemetry.

Fallback: if no application frame is found (e.g. called from JDK internals),
reports the vulnerable symbol itself so the backend knows it was reached.

Class-level hits (<clinit>) are unchanged — no callsite at class load time.
ScaReachabilityCallback (bootstrap) must stay minimal — complex logic
does not belong there. Move findCallsite() to ScaReachabilitySystem
which has access to internal-api utilities.

The handler runs synchronously so the full call stack is still present:
  ScaReachabilitySystem handler
  ScaReachabilityCallback.onMethodHit
  <vulnerable method>
  <application callsite>  ← reported

Uses the same class-prefix predicate as AbstractStackWalker.
isNotDatadogTraceStackElement (package-private, so replicated inline)
to skip agent/JDK frames, consistent with the IAST trie-based filtering
infrastructure used elsewhere in the codebase.
…ltering

Make isNotDatadogTraceStackElement public in AbstractStackWalker so SCA
Reachability can use the existing predicate directly rather than duplicating
the 3 class-prefix conditions inline.
… path

ScaReachabilitySystemCallsiteTest covers:
- findCallsite returns null when vulnerable class is not on the stack
  (triggers fallback: handler reports the vulnerable symbol itself)
- findCallsite skips the vulnerable class frame and returns the first
  non-agent frame above it (using java.lang.Thread as a non-agent class
  guaranteed to be at the top of getStackTrace())

Note on the method-level integration test: TargetClass is in
com.datadog.appsec.sca.* (agent namespace) so AbstractStackWalker filters
it as agent code and findCallsite() returns null. The test now documents
this fallback behaviour explicitly. In production the vulnerable class is
always a 3rd-party library (e.g. com.fasterxml.jackson.*) and the happy
path fires correctly — verified by ScaReachabilitySystemCallsiteTest.
The stream().anyMatch() for detecting method-level symbols was computed
for every entry unconditionally. It is only needed when version == null
(deps not yet resolvable). Moving the check inside the version==null
branch eliminates the stream allocation on the common path where the
version resolves successfully.
@jandro996
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6008ac9ca0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…cators

JDK classes (e.g. java.sql.PreparedStatement, protectionDomain==null) are
loaded by ANY app that uses JDBC, regardless of which driver is present.
Using their presence to infer that a specific library (e.g. PostgreSQL) is
"reachable" produces classpath-presence false positives, not runtime
reachability signals.

Entries that list JDK symbols (e.g. the PostgreSQL advisory) also include
library-specific classes (e.g. org.postgresql.ds.PGSimpleDataSource) that
Path A correctly detects when those classes are actually loaded.

In checkAlreadyLoadedClasses(), classes with no code source (JDK/bootstrap)
are now skipped silently. The invariants and KB are updated accordingly.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 24c53f05cc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

jandro996 added 2 commits May 19, 2026 10:40
The smoke test app uses application.properties (not application.yml),
so Spring Boot never triggers snakeyaml loading. ScaReachabilityInit
instantiates Yaml in @PostConstruct to ensure class-level CVE detection
at startup.

Also removes the unused ScaReachabilityCollector draft and reverts
TelemetrySystem to register both DependencyPeriodicAction and
ScaReachabilityPeriodicAction (restoring dual-action mode).

Smoke test now passes in ~4s with the default 30s timeout.
@datadog-official

This comment has been minimized.

jandro996 added 6 commits May 19, 2026 14:40
…ame, fix recordHit TOCTOU

- Remove unused markPending() - the retry-via-re-mark flow was superseded
  by the emit-immediately approach in Step 3 of ScaReachabilityPeriodicAction
- Inline private single-statement scheduleRetransformByName() at its only call site
- Fix non-atomic containsKey+get in recordHit: use computeIfAbsent directly,
  which eliminates the race and avoids the unintended pendingReport=true side
  effect that unconditional registerCve() would have introduced
- Update stale Javadoc in ScaReachabilityPeriodicAction referencing the
  old re-mark-pending retry strategy
…d (Option C)

When DD_APPSEC_SCA_ENABLED=true, ScaReachabilityPeriodicAction handles all
app-dependencies-loaded emission by merging DependencyService drains with CVE
registry state. DependencyPeriodicAction is skipped to avoid:
  - Duplicate entries per dep per heartbeat
  - DependencyService queue being drained before ScaReachabilityPeriodicAction
    can see new JARs (which prevented knownDeps from being populated and caused
    CVE entries to always emit without source/hash)

The previous attempt (commit b0421ab) failed system-tests due to a
markPending retry in Step 3 that could delay CVE emission by one heartbeat
cycle. That was fixed in 6925358 (immediate emit). The current code is safe.
@jandro996
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7b449cff45

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread buildSrc/src/main/kotlin/datadog/gradle/sca/GhsaEnrichmentParser.kt
Comment thread dd-java-agent/appsec/src/main/java/com/datadog/appsec/sca/ScaCveDatabase.java Outdated
… symbols for same class

An entry with symbols [Yaml.load, Yaml.loadAll] indexed once per symbol causing
the same ScaEntry to appear twice for org/yaml/snakeyaml/Yaml. processClass then
iterated the duplicate and injected bytecode callbacks twice per method, resulting
in redundant ScaReachabilityCallback.onMethodHit() calls on every invocation.

Fix: track seen class names per entry with a HashSet; add regression test.
@jandro996 jandro996 marked this pull request as ready for review May 20, 2026 06:08
@jandro996 jandro996 requested review from a team as code owners May 20, 2026 06:08
@jandro996 jandro996 requested review from claponcet, dougqh and manuel-alvarez-alvarez and removed request for a team May 20, 2026 06:08
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 05e253778d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

jandro996 added 8 commits May 20, 2026 10:11
…classloader scenarios

ScaReachabilityCallback.reported and ScaReachabilityTransformer.reportedHits
keyed hits without the artifact version. In a multi-classloader setup (multiple
WARs, OSGi) where two versions of the same library coexist, the first version's
hit suppressed all subsequent versions for the same CVE and symbol.

Both keys now include version: vulnId|artifact|version|class|method.
When the call chain is client -> lib_wrapper -> vulnerable_method,
findCallsite was reporting lib_wrapper instead of the client frame.

Add sca_stack_exclusion.trie (generated as ScaStackExclusionTrie) and
apply it after the vulnerable-class boundary to skip known library
packages, matching the approach used by IAST SinkModuleBase.

Refactor findCallsite to a testable overload that accepts an explicit
StackTraceElement[], and rewrite tests with synthetic stacks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: asm waf Application Security Management (WAF) comp: telemetry Telemetry tag: override-groovy-enforcement Override the "Enforce Groovy Migration" check type: enhancement Enhancements and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant