fix(sec-core): degrade prompt scan to L1 when ML model not downloaded by jfeng18 · Pull Request #791 · alibaba/anolisa

jfeng18 · 2026-06-08T15:48:19Z

Summary

Fixes prompt injection scanning being non-functional on any host where the ML model was never downloaded (the default after install). Previously every prompt_scan errored with ModelLoadError instead of degrading to L1.

Fixes #790.

Root Cause

MLClassifier inherited is_available()->True, so the scanner treated L2 (ML) as mandatory-and-available even when the Llama-Prompt-Guard-2 model was absent. At scan time detect() raised ModelLoadError, and because ml_classifier was not in _OPTIONAL_DETECTORS, the whole scan errored — no fallback to L1 (regex).

(torch/transformers ARE installed; only the model files were missing. Distinct from #680, which changes the cosh hook's fail-open→fail-ask behavior but not this model-availability gap.)

Changes

File	Change
`detectors/ml_classifier.py`	Add `is_available()` override: probe torch/transformers importability + model presence
`models/model_manager.py`	New `is_model_downloaded()` predicate (reused by `_resolve_local_model_path`)
`scanner.py`	Add `ml_classifier` to `_OPTIONAL_DETECTORS` → skip to L1 when unavailable
`scanner.py`	`warmup()` bypasses the `is_available()` gate (builds detectors from config) so it can download a currently-unavailable model

Coupling note: the is_available() override and the _OPTIONAL_DETECTORS change must land together — changing only the former makes a False detector raise LayerNotAvailableError in the constructor (eager crash) instead of degrading.

What's NOT changed

cosh hook / prompt_scanner_hook.py — that's fix(sec-core): prompt scanner fail-ask on error #680's domain
Daemon preload (feat(sec-core): route prompt scan to daemon and add prompt model preload #786) — separate auto-warmup path
No install-time auto-warmup added (out of scope; tracked separately)
Pre-existing isort disorder in model_manager.py (pydantic import) left untouched — exists on main, not in scope

Verification (ECS, kernel 6.6.102+, Python 3.11)

Scenario	Before	After
Model absent, `--mode standard`	`verdict=error`, ModelLoadError	L1 fallback, warns + skips L2, no error
`scan-prompt warmup`	silently no-op (chicken-and-egg)	downloads model (1.04G)
Model present, injection text	n/a	DENY, jailbreak, 99.9% confidence
Model present, benign text	n/a	PASS
Unit tests	—	287 passed
black 26.3.1 (CI version)	—	clean

Discriminating signal

# Before (ml_classifier mandatory): verdict=error "Detector 'ml_classifier' is not available"
# After: PASS via L1, log "Detector 'ml_classifier' ... will be skipped"

MLClassifier inherited is_available()->True, so the scanner treated L2 as mandatory-and-available even when the model was never downloaded. Every scan then raised ModelLoadError, leaving prompts unscanned. - MLClassifier.is_available(): probe torch/transformers + model presence - ModelManager.is_model_downloaded(): reusable on-disk predicate - scanner: add ml_classifier to _OPTIONAL_DETECTORS so it skips to L1 - scanner.warmup(): bypass is_available() gate so it can download a model that is currently unavailable (chicken-and-egg) E2E on ECS: model absent -> L1 fallback, no error; warmup downloads 1.04G; model present -> L2 catches injection at 99.9%. 287 unit tests pass. Fixes alibaba#790. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…solution - test_warmup_bypasses_availability_gate: warmup invokes detector.warmup() even when is_available() is False (covers scanner.py warmup new code) - test_is_model_downloaded_{false,true}: covers ModelManager.is_model_downloaded - test_resolve_local_model_path_returns_path: covers _resolve_local_model_path happy path after refactor to use is_model_downloaded() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jfeng18 · 2026-06-12T07:20:06Z

Hi @haosanzi, this PR degrades to L1 (regex) scanning when the ML model is unavailable, instead of skipping the scan entirely. As the prompt scan module owner, could you take a look when you have time?

jfeng18 requested review from RemindD, edonyzpc and kid9 as code owners June 8, 2026 15:48

github-actions Bot added the component:sec-core src/agent-sec-core/ label Jun 8, 2026

jfeng18 force-pushed the fix/promptscan-graceful-degrade branch from 7c47726 to 503719c Compare June 9, 2026 00:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sec-core): degrade prompt scan to L1 when ML model not downloaded#791

fix(sec-core): degrade prompt scan to L1 when ML model not downloaded#791
jfeng18 wants to merge 2 commits into
alibaba:mainfrom
jfeng18:fix/promptscan-graceful-degrade

jfeng18 commented Jun 8, 2026

Uh oh!

jfeng18 commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jfeng18 commented Jun 8, 2026

Summary

Root Cause

Changes

What's NOT changed

Verification (ECS, kernel 6.6.102+, Python 3.11)

Discriminating signal

Uh oh!

jfeng18 commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant