Component
strix/report/sarif.py — _class_keyword / _VULN_CLASS_KEYWORDS
Environment
Description
_class_keyword scans _VULN_CLASS_KEYWORDS and returns the first substring match. The comment above the list documents the invariant:
Order matters — first match wins, so precise terms come before fuzzy ones … a future maintainer adding sloppy entries could collapse distinct findings to the same class hash.
But "denial of service" is listed before "regex denial of service", and the former is a substring of the latter. So any finding whose title contains "Regex Denial of Service" resolves to the generic "denial of service" class.
For locationless findings (synthetically anchored to SECURITY.md), the class keyword is the only differentiator in _primary_fingerprint (the synth_class: component). A ReDoS and a generic DoS finding on the same CWE therefore get the same partialFingerprints.primaryLocationLineHash, so a SARIF consumer such as GitHub code-scanning silently dedups one genuine finding away.
Steps to reproduce
from strix.report.sarif import _class_keyword
_class_keyword("Regex Denial of Service in email validator")
Expected
"regex denial of service" — and, end-to-end, two distinct fingerprints for a ReDoS + generic DoS pair.
Actual
"denial of service" — the ReDoS collapses into the generic DoS class, the two findings share a fingerprint, and one is deduplicated away.
Proposed fix
#662 — reorder so "regex denial of service" precedes "denial of service", with regression tests at the _class_keyword and write_sarif levels.
Component
strix/report/sarif.py—_class_keyword/_VULN_CLASS_KEYWORDSEnvironment
main(302efed) — present since the SARIF emitter (feat(report): SARIF 2.1.0 emitter for CI / code-scanning integration #626)Description
_class_keywordscans_VULN_CLASS_KEYWORDSand returns the first substring match. The comment above the list documents the invariant:But
"denial of service"is listed before"regex denial of service", and the former is a substring of the latter. So any finding whose title contains "Regex Denial of Service" resolves to the generic"denial of service"class.For locationless findings (synthetically anchored to
SECURITY.md), the class keyword is the only differentiator in_primary_fingerprint(thesynth_class:component). A ReDoS and a generic DoS finding on the same CWE therefore get the samepartialFingerprints.primaryLocationLineHash, so a SARIF consumer such as GitHub code-scanning silently dedups one genuine finding away.Steps to reproduce
Expected
"regex denial of service"— and, end-to-end, two distinct fingerprints for a ReDoS + generic DoS pair.Actual
"denial of service"— the ReDoS collapses into the generic DoS class, the two findings share a fingerprint, and one is deduplicated away.Proposed fix
#662 — reorder so
"regex denial of service"precedes"denial of service", with regression tests at the_class_keywordandwrite_sariflevels.