Skip to content

feat(scan): defense-in-depth triage hints (Item E)#211

Open
kurtpayne wants to merge 4 commits into
mainfrom
feat/ml-defense-output
Open

feat(scan): defense-in-depth triage hints (Item E)#211
kurtpayne wants to merge 4 commits into
mainfrom
feat/ml-defense-output

Conversation

@kurtpayne
Copy link
Copy Markdown
Owner

Summary

Item E of the post-v4.7 pivot. The scanner has 4 detection layers (static rules, IOC matching, ML, optional behavioral trace). Today they fire independently. This PR adds cross-layer triage hints — advisory recommendations surfaced in `ScanReport.triage_hints` after all layers have run.

Hints don't change the verdict or score. They tell the user why the verdict came out as it did and what action could improve confidence.

Hint types

ID Trigger Recommendation
H001 ESCALATE_TO_TRACE PINJ-ML-001 fires with `logit_confidence < 0.7` AND no static-rule corroboration on the same file Run `skillscan-trace` for behavioral verification
H002 INTEL_GAP Finding has indicators (URL/domain/IP) not present in the IOC database Run `skillscan intel refresh` or manually verify
H003 STRONG_CORROBORATION ML + static rule both fire on the same file Informational — verdict is well-corroborated

Stack note

This PR depends on #205 (logit_confidence) and #208 (indicators). Both base commits are cherry-picked into this branch so the diff is self-contained. Once #205 and #208 land in main, this PR rebases cleanly (cherry-picks become no-ops).

Implementation

  • `Indicator`, `logit_confidence`, `TriageHint` models in `models.py` (strictly additive — `ScanReport.triage_hints` defaults to `[]`).
  • `skillscan.triage_hints.compute_triage_hints(findings, iocs)` — pure function, easily unit-testable.
  • Wired into `analysis_pkg/_scanner.py` after findings are collected. Wrapped in `try/except` — a hint bug never breaks the scanner.
  • IOC value normalisation strips `http(s)://` scheme so URL indicators match domain-shaped IOCs.

Test plan

  • `SKILLSCAN_NO_USER_RULES=1 pytest tests/test_triage_hints.py -v` — 18/18 passing
  • Full suite (excluding 7 stale-rule tests already fixed by test: align test_rules.py with rules-snapshot at 2026.04.25 (unblocks main) #204): 765 passed, 8 skipped
  • `ruff format` + `ruff check` — clean on all touched files
  • Manual smoke against the bundled v4 GGUF on a held-out skill: confirm the H001/H002/H003 trio surfaces correctly when `--ml-detect` runs

Future work (not in this PR)

  • SARIF integration: emit hints as additional rule properties
  • Text/CLI summary: add a "Recommendations" section after the findings table
  • Skillscan-trace integration: when H001 emits, the trace tool could deep-link back to the originating ML finding

🤖 Generated with Claude Code

hints = compute_triage_hints([f], iocs)
h002 = [h for h in hints if h.id == "H002"]
assert len(h002) == 1
assert "novel-bad.io" in h002[0].detail
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 26, 2026

Codecov Report

❌ Patch coverage is 90.44586% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.61%. Comparing base (f72c00e) to head (ef01c17).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/skillscan/ml_detector.py 75.80% 15 Missing ⚠️
src/skillscan/indicators.py 94.67% 9 Missing ⚠️
src/skillscan/analysis_pkg/_scanner.py 44.44% 5 Missing ⚠️
src/skillscan/triage_hints.py 98.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #211      +/-   ##
==========================================
+ Coverage   75.87%   76.61%   +0.73%     
==========================================
  Files          41       43       +2     
  Lines        5994     6307     +313     
==========================================
+ Hits         4548     4832     +284     
- Misses       1446     1475      +29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

kurtpayne and others added 4 commits April 26, 2026 17:09
Item C of the post-v4.7 pivot. Replaces "look at line 12" with
"look at the curl to evil.example.com on line 12" by post-processing
the model's output (no retraining required).

Adds:
- skillscan.models.Indicator — pydantic model with type/value/line/evidence
- skillscan.models.Finding.indicators — new optional list field
  (default [], backward-compatible with all existing consumers)
- skillscan.indicators — extractor module with regex-based extractors
  for 6 indicator types:
    url       — http(s) URLs (terminator-aware: shell substitution
                $(...) and backticks don't get absorbed)
    cve       — CVE-YYYY-NNNN[NNN], also extracted from `reasoning`
                text (model often cites CVEs not in skill body)
    ip        — IPv4 dotted-quad with octet validation; localhost
                noise floor (127.x, 0.0.0.0)
    domain    — bare hostnames not already surfaced by URL extractor;
                lookbehind blocks parent-domain dupes (`nist.gov`
                inside `nvd.nist.gov`); 30-entry common-domain noise
                floor (github, npm, pypi, anthropic, ...)
    package   — npm scoped (@scope/name) anywhere; pip/npm/yarn/pnpm
                install command line capture (multi-package)
    file_path — /etc, /var, /tmp, /usr, /root system paths;
                ../../traversal; ~/.dotfiles; Windows C:\
- 25 unit tests covering each extractor and a realistic-skill integration
- Wiring in ml_detector.py: extract_indicators() runs once per file
  and the same list is attached to each label-specific Finding produced.
  Wrapped in try/except — extractor failure never breaks the scanner.

Conservative posture: when in doubt, drop. False indicators are worse
than missing ones because they give downstream tooling bad targets to
act on. Cap is 50 indicators per finding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The discrete `confidence` field the model emits buckets at 0.9 / 0.95 / 1.0
(83% at 0.95) — useless for thresholding because every wrong prediction
also lands at 0.95. Eval data on v4.7's 431-file held-out set: all 4 model
errors had logit_confidence ∈ [0.58, 0.76]; all 426 correct predictions had
logit_confidence ≥ 0.99 except a handful in [0.80, 0.99]. Threshold 0.80
flags 100% of errors while accepting 94% of files.

Implementation:
- Load the GGUF with `logits_all=True` so llama-cpp-python returns
  per-token logprobs.
- Inference passes `logprobs=True, top_logprobs=5` alongside the existing
  GBNF grammar. Falls back gracefully (one-shot retry without logprobs)
  when an older llama-cpp-python rejects the args.
- _extract_logit_confidence() finds the verdict-starting token and
  softmaxes the logp(ben) vs logp(mal) entries to produce continuous
  P(predicted_verdict) ∈ [0, 1]. Handles the missing-from-top-K case with
  a soft floor.
- Surfaced as Finding.logit_confidence (Optional[float]). Older clients
  without logprobs payloads get None — fully backward-compatible.

Severity mapping is unchanged in this commit; logit_confidence is an
additional signal that downstream tooling can threshold against. Future
PR can fold it into severity demotion (e.g., MED → LOW when logit < 0.7).

Eval evidence: skillscan-corpus/eval_results/v47_logit_confidence_eval.json

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes Item B's user-facing promise: \"Enables --threshold 0.85 for CI gates\".
The earlier commit added Finding.logit_confidence; this commit makes it
actually usable from the command line.

Adds --ml-threshold (also SKILLSCAN_ML_THRESHOLD env var, default 0.0).
When > 0, drops PINJ-ML-001 findings whose logit_confidence is below the
threshold. Advisory findings (PINJ-ML-NO-MODEL/STALE/LARGE-FILE/UNAVAIL)
are never filtered. Findings without logit_confidence (older clients) are
also never filtered — backward-safe.

Recommended thresholds (per the 431-file v4.7 held-out eval):
  --ml-threshold 0.99   — keeps 60%, all correct (strictest CI gate)
  --ml-threshold 0.90   — keeps 87%, all correct
  --ml-threshold 0.80   — keeps 94%, all correct, drops every model error
  --ml-threshold 0.70   — keeps 97%, 99.5% correct (lenient)

Plumbed through scanner.scan() at three CLI callsites + the underlying
_scanner.scan(). No CLI flag at default = no behaviour change for
existing users.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cross-layer recommendations surfaced in ScanReport.triage_hints. Hints
are advisory — they do NOT change the verdict or score. They run after
all detection layers (static rules, IOC, ML, optional trace) have fired
and emerge from the combined signal.

Hint types:

  H001 ESCALATE_TO_TRACE    — ML detected with logit_confidence < 0.7
                              AND no static-rule corroboration on the
                              same file. Recommend skillscan-trace for
                              behavioral verification.

  H002 INTEL_GAP            — Indicators (URL/domain/IP) extracted from
                              a finding aren't present in the IOC DB.
                              Recommend `skillscan intel refresh` or
                              manual verification.

  H003 STRONG_CORROBORATION — ML + static rule both flagged the same
                              file. Informational hint surfacing the
                              multi-layer agreement.

Builds on Item B's logit_confidence (uncertainty signal) and Item C's
indicators (cross-reference target). Integration:

  - models.py: TriageHint model + ScanReport.triage_hints field
    (default [], strictly additive)
  - triage_hints.py: pure-function compute_triage_hints(findings, iocs)
  - analysis_pkg/_scanner.py: invoked after findings are collected,
    wrapped in try/except — a hint bug never breaks the scanner
  - 18 unit tests covering each hint, per-file isolation, advisory
    suppression, IOC URL normalisation, edge cases

Future work (not in this PR): SARIF properties, text-output rendering
of hints in the CLI summary section.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kurtpayne kurtpayne force-pushed the feat/ml-defense-output branch from 63a111b to ef01c17 Compare April 27, 2026 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants