fix(sec-core): degrade prompt scan to L1 when ML model not downloaded#791
Open
jfeng18 wants to merge 2 commits into
Open
fix(sec-core): degrade prompt scan to L1 when ML model not downloaded#791jfeng18 wants to merge 2 commits into
jfeng18 wants to merge 2 commits into
Conversation
MLClassifier inherited is_available()->True, so the scanner treated L2 as mandatory-and-available even when the model was never downloaded. Every scan then raised ModelLoadError, leaving prompts unscanned. - MLClassifier.is_available(): probe torch/transformers + model presence - ModelManager.is_model_downloaded(): reusable on-disk predicate - scanner: add ml_classifier to _OPTIONAL_DETECTORS so it skips to L1 - scanner.warmup(): bypass is_available() gate so it can download a model that is currently unavailable (chicken-and-egg) E2E on ECS: model absent -> L1 fallback, no error; warmup downloads 1.04G; model present -> L2 catches injection at 99.9%. 287 unit tests pass. Fixes alibaba#790. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…solution
- test_warmup_bypasses_availability_gate: warmup invokes detector.warmup()
even when is_available() is False (covers scanner.py warmup new code)
- test_is_model_downloaded_{false,true}: covers ModelManager.is_model_downloaded
- test_resolve_local_model_path_returns_path: covers _resolve_local_model_path
happy path after refactor to use is_model_downloaded()
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7c47726 to
503719c
Compare
Contributor
Author
|
Hi @haosanzi, this PR degrades to L1 (regex) scanning when the ML model is unavailable, instead of skipping the scan entirely. As the prompt scan module owner, could you take a look when you have time? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes prompt injection scanning being non-functional on any host where the ML model was never downloaded (the default after install). Previously every
prompt_scanerrored withModelLoadErrorinstead of degrading to L1.Fixes #790.
Root Cause
MLClassifierinheritedis_available()->True, so the scanner treated L2 (ML) as mandatory-and-available even when the Llama-Prompt-Guard-2 model was absent. At scan timedetect()raisedModelLoadError, and becauseml_classifierwas not in_OPTIONAL_DETECTORS, the whole scan errored — no fallback to L1 (regex).(torch/transformers ARE installed; only the model files were missing. Distinct from #680, which changes the cosh hook's fail-open→fail-ask behavior but not this model-availability gap.)
Changes
detectors/ml_classifier.pyis_available()override: probe torch/transformers importability + model presencemodels/model_manager.pyis_model_downloaded()predicate (reused by_resolve_local_model_path)scanner.pyml_classifierto_OPTIONAL_DETECTORS→ skip to L1 when unavailablescanner.pywarmup()bypasses theis_available()gate (builds detectors from config) so it can download a currently-unavailable modelCoupling note: the
is_available()override and the_OPTIONAL_DETECTORSchange must land together — changing only the former makes a False detector raiseLayerNotAvailableErrorin the constructor (eager crash) instead of degrading.What's NOT changed
prompt_scanner_hook.py— that's fix(sec-core): prompt scanner fail-ask on error #680's domainmodel_manager.py(pydantic import) left untouched — exists on main, not in scopeVerification (ECS, kernel 6.6.102+, Python 3.11)
--mode standardverdict=error, ModelLoadErrorscan-prompt warmupDiscriminating signal