Skip to content

fix(sec-core): degrade prompt scan to L1 when ML model not downloaded#791

Open
jfeng18 wants to merge 2 commits into
alibaba:mainfrom
jfeng18:fix/promptscan-graceful-degrade
Open

fix(sec-core): degrade prompt scan to L1 when ML model not downloaded#791
jfeng18 wants to merge 2 commits into
alibaba:mainfrom
jfeng18:fix/promptscan-graceful-degrade

Conversation

@jfeng18

@jfeng18 jfeng18 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes prompt injection scanning being non-functional on any host where the ML model was never downloaded (the default after install). Previously every prompt_scan errored with ModelLoadError instead of degrading to L1.

Fixes #790.

Root Cause

MLClassifier inherited is_available()->True, so the scanner treated L2 (ML) as mandatory-and-available even when the Llama-Prompt-Guard-2 model was absent. At scan time detect() raised ModelLoadError, and because ml_classifier was not in _OPTIONAL_DETECTORS, the whole scan errored — no fallback to L1 (regex).

(torch/transformers ARE installed; only the model files were missing. Distinct from #680, which changes the cosh hook's fail-open→fail-ask behavior but not this model-availability gap.)

Changes

File Change
detectors/ml_classifier.py Add is_available() override: probe torch/transformers importability + model presence
models/model_manager.py New is_model_downloaded() predicate (reused by _resolve_local_model_path)
scanner.py Add ml_classifier to _OPTIONAL_DETECTORS → skip to L1 when unavailable
scanner.py warmup() bypasses the is_available() gate (builds detectors from config) so it can download a currently-unavailable model

Coupling note: the is_available() override and the _OPTIONAL_DETECTORS change must land together — changing only the former makes a False detector raise LayerNotAvailableError in the constructor (eager crash) instead of degrading.

What's NOT changed

Verification (ECS, kernel 6.6.102+, Python 3.11)

Scenario Before After
Model absent, --mode standard verdict=error, ModelLoadError L1 fallback, warns + skips L2, no error
scan-prompt warmup silently no-op (chicken-and-egg) downloads model (1.04G)
Model present, injection text n/a DENY, jailbreak, 99.9% confidence
Model present, benign text n/a PASS
Unit tests 287 passed
black 26.3.1 (CI version) clean

Discriminating signal

# Before (ml_classifier mandatory): verdict=error "Detector 'ml_classifier' is not available"
# After: PASS via L1, log "Detector 'ml_classifier' ... will be skipped"

MLClassifier inherited is_available()->True, so the scanner treated L2
as mandatory-and-available even when the model was never downloaded.
Every scan then raised ModelLoadError, leaving prompts unscanned.

- MLClassifier.is_available(): probe torch/transformers + model presence
- ModelManager.is_model_downloaded(): reusable on-disk predicate
- scanner: add ml_classifier to _OPTIONAL_DETECTORS so it skips to L1
- scanner.warmup(): bypass is_available() gate so it can download a
  model that is currently unavailable (chicken-and-egg)

E2E on ECS: model absent -> L1 fallback, no error; warmup downloads
1.04G; model present -> L2 catches injection at 99.9%. 287 unit tests pass.

Fixes alibaba#790.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jfeng18 jfeng18 requested review from RemindD, edonyzpc and kid9 as code owners June 8, 2026 15:48
@github-actions github-actions Bot added the component:sec-core src/agent-sec-core/ label Jun 8, 2026
…solution

- test_warmup_bypasses_availability_gate: warmup invokes detector.warmup()
  even when is_available() is False (covers scanner.py warmup new code)
- test_is_model_downloaded_{false,true}: covers ModelManager.is_model_downloaded
- test_resolve_local_model_path_returns_path: covers _resolve_local_model_path
  happy path after refactor to use is_model_downloaded()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jfeng18 jfeng18 force-pushed the fix/promptscan-graceful-degrade branch from 7c47726 to 503719c Compare June 9, 2026 00:36
@jfeng18

jfeng18 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

Hi @haosanzi, this PR degrades to L1 (regex) scanning when the ML model is unavailable, instead of skipping the scan entirely. As the prompt scan module owner, could you take a look when you have time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:sec-core src/agent-sec-core/

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[sec-core] bug(sec-core): prompt_scan fails on every scan when ML model not downloaded (no graceful degradation)

1 participant