feat(scan): defense-in-depth triage hints (Item E)#211
Open
kurtpayne wants to merge 4 commits into
Open
Conversation
| hints = compute_triage_hints([f], iocs) | ||
| h002 = [h for h in hints if h.id == "H002"] | ||
| assert len(h002) == 1 | ||
| assert "novel-bad.io" in h002[0].detail |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #211 +/- ##
==========================================
+ Coverage 75.87% 76.61% +0.73%
==========================================
Files 41 43 +2
Lines 5994 6307 +313
==========================================
+ Hits 4548 4832 +284
- Misses 1446 1475 +29 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Item C of the post-v4.7 pivot. Replaces "look at line 12" with
"look at the curl to evil.example.com on line 12" by post-processing
the model's output (no retraining required).
Adds:
- skillscan.models.Indicator — pydantic model with type/value/line/evidence
- skillscan.models.Finding.indicators — new optional list field
(default [], backward-compatible with all existing consumers)
- skillscan.indicators — extractor module with regex-based extractors
for 6 indicator types:
url — http(s) URLs (terminator-aware: shell substitution
$(...) and backticks don't get absorbed)
cve — CVE-YYYY-NNNN[NNN], also extracted from `reasoning`
text (model often cites CVEs not in skill body)
ip — IPv4 dotted-quad with octet validation; localhost
noise floor (127.x, 0.0.0.0)
domain — bare hostnames not already surfaced by URL extractor;
lookbehind blocks parent-domain dupes (`nist.gov`
inside `nvd.nist.gov`); 30-entry common-domain noise
floor (github, npm, pypi, anthropic, ...)
package — npm scoped (@scope/name) anywhere; pip/npm/yarn/pnpm
install command line capture (multi-package)
file_path — /etc, /var, /tmp, /usr, /root system paths;
../../traversal; ~/.dotfiles; Windows C:\
- 25 unit tests covering each extractor and a realistic-skill integration
- Wiring in ml_detector.py: extract_indicators() runs once per file
and the same list is attached to each label-specific Finding produced.
Wrapped in try/except — extractor failure never breaks the scanner.
Conservative posture: when in doubt, drop. False indicators are worse
than missing ones because they give downstream tooling bad targets to
act on. Cap is 50 indicators per finding.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The discrete `confidence` field the model emits buckets at 0.9 / 0.95 / 1.0 (83% at 0.95) — useless for thresholding because every wrong prediction also lands at 0.95. Eval data on v4.7's 431-file held-out set: all 4 model errors had logit_confidence ∈ [0.58, 0.76]; all 426 correct predictions had logit_confidence ≥ 0.99 except a handful in [0.80, 0.99]. Threshold 0.80 flags 100% of errors while accepting 94% of files. Implementation: - Load the GGUF with `logits_all=True` so llama-cpp-python returns per-token logprobs. - Inference passes `logprobs=True, top_logprobs=5` alongside the existing GBNF grammar. Falls back gracefully (one-shot retry without logprobs) when an older llama-cpp-python rejects the args. - _extract_logit_confidence() finds the verdict-starting token and softmaxes the logp(ben) vs logp(mal) entries to produce continuous P(predicted_verdict) ∈ [0, 1]. Handles the missing-from-top-K case with a soft floor. - Surfaced as Finding.logit_confidence (Optional[float]). Older clients without logprobs payloads get None — fully backward-compatible. Severity mapping is unchanged in this commit; logit_confidence is an additional signal that downstream tooling can threshold against. Future PR can fold it into severity demotion (e.g., MED → LOW when logit < 0.7). Eval evidence: skillscan-corpus/eval_results/v47_logit_confidence_eval.json Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes Item B's user-facing promise: \"Enables --threshold 0.85 for CI gates\". The earlier commit added Finding.logit_confidence; this commit makes it actually usable from the command line. Adds --ml-threshold (also SKILLSCAN_ML_THRESHOLD env var, default 0.0). When > 0, drops PINJ-ML-001 findings whose logit_confidence is below the threshold. Advisory findings (PINJ-ML-NO-MODEL/STALE/LARGE-FILE/UNAVAIL) are never filtered. Findings without logit_confidence (older clients) are also never filtered — backward-safe. Recommended thresholds (per the 431-file v4.7 held-out eval): --ml-threshold 0.99 — keeps 60%, all correct (strictest CI gate) --ml-threshold 0.90 — keeps 87%, all correct --ml-threshold 0.80 — keeps 94%, all correct, drops every model error --ml-threshold 0.70 — keeps 97%, 99.5% correct (lenient) Plumbed through scanner.scan() at three CLI callsites + the underlying _scanner.scan(). No CLI flag at default = no behaviour change for existing users. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cross-layer recommendations surfaced in ScanReport.triage_hints. Hints
are advisory — they do NOT change the verdict or score. They run after
all detection layers (static rules, IOC, ML, optional trace) have fired
and emerge from the combined signal.
Hint types:
H001 ESCALATE_TO_TRACE — ML detected with logit_confidence < 0.7
AND no static-rule corroboration on the
same file. Recommend skillscan-trace for
behavioral verification.
H002 INTEL_GAP — Indicators (URL/domain/IP) extracted from
a finding aren't present in the IOC DB.
Recommend `skillscan intel refresh` or
manual verification.
H003 STRONG_CORROBORATION — ML + static rule both flagged the same
file. Informational hint surfacing the
multi-layer agreement.
Builds on Item B's logit_confidence (uncertainty signal) and Item C's
indicators (cross-reference target). Integration:
- models.py: TriageHint model + ScanReport.triage_hints field
(default [], strictly additive)
- triage_hints.py: pure-function compute_triage_hints(findings, iocs)
- analysis_pkg/_scanner.py: invoked after findings are collected,
wrapped in try/except — a hint bug never breaks the scanner
- 18 unit tests covering each hint, per-file isolation, advisory
suppression, IOC URL normalisation, edge cases
Future work (not in this PR): SARIF properties, text-output rendering
of hints in the CLI summary section.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
63a111b to
ef01c17
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Item E of the post-v4.7 pivot. The scanner has 4 detection layers (static rules, IOC matching, ML, optional behavioral trace). Today they fire independently. This PR adds cross-layer triage hints — advisory recommendations surfaced in `ScanReport.triage_hints` after all layers have run.
Hints don't change the verdict or score. They tell the user why the verdict came out as it did and what action could improve confidence.
Hint types
Stack note
This PR depends on #205 (logit_confidence) and #208 (indicators). Both base commits are cherry-picked into this branch so the diff is self-contained. Once #205 and #208 land in main, this PR rebases cleanly (cherry-picks become no-ops).
Implementation
Test plan
Future work (not in this PR)
🤖 Generated with Claude Code