fix: resolve prompt injection detector homoglyph bypass (#102)#119
fix: resolve prompt injection detector homoglyph bypass (#102)#119MayurKharat0390 wants to merge 1 commit into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR strengthens the prompt-injection detector by adding homoglyph-to-ASCII translation to its text normalization pipeline. A static character mapping translates visual lookalikes (Cyrillic/Greek/Latin variants) to ASCII equivalents, and the ChangesHomoglyph normalization for injection detection
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@agentwatch/core/injection.py`:
- Around line 30-92: _HOMOGLYPH_MAP is missing the Cyrillic small letter "м"
(U+043C), allowing strings like "proмpt" to bypass normalization; add the
mapping "\u043c": "m" to the _HOMOGLYPH_MAP dictionary so lowercase Cyrillic м
is normalized to Latin 'm' (update the existing _HOMOGLYPH_MAP definition in
agentwatch/core/injection.py).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 613daebc-03ec-435f-b1be-7b76b4a6f779
📒 Files selected for processing (2)
agentwatch/core/injection.pytests/test_safety.py
|
@MayurKharat0390 Thanks for the contribution — this addresses a real security gap and the added regression test is appreciated. Before merge, could you please take a look at the remaining CodeRabbit finding regarding Cyrillic lowercase м (U+043C)? Since this PR is specifically focused on homoglyph bypass prevention, I'd like to make sure we aren't leaving an obvious bypass path uncovered. Once that's addressed, this should be ready for merge. 🚀 |
0ec813f to
1c84594
Compare
|
Hey @sreerevanth! I've addressed the CodeRabbit review finding by adding the mapping for Cyrillic lowercase I also added a corresponding regression test case ( All formatting rules and unit tests pass with complete success. It's ready for another look! 🚀 |
Problem
The
scan_textfunction inagentwatch/core/injection.pyrelies onunicodedata.normalize("NFKC", text)under the assumption that it translates visual homoglyphs (such as Cyrillic or Greek lookalikes) to their canonical ASCII equivalents before the regex patterns are evaluated.However, Unicode NFKC normalization never translates visual confusables across different scripts (e.g. mapping Cyrillic small letter
о(U+043E) to Latin small lettero(U+006F)).Since the injection detector's patterns are Latin ASCII-only, an attacker can bypass all prompt injection signatures by replacing Latin letters with visually identical Greek or Cyrillic homoglyphs.
Solution
_HOMOGLYPH_MAPinsideagentwatch/core/injection.pythat maps visually identical/confusable Greek, Cyrillic, and Latin Extended characters to standard Latin ASCII._normalize(text)to map these lookalike characters to their Latin counterparts after NFKC normalization.Testing & Verification
test_injection_detector_homoglyph_bypass_preventionintests/test_safety.pyto assert that visual lookalike payloads spelling out"ignore previous instructions","reveal your prompt", and"new instructions:"are successfully blocked.ruffand all 265 unit/integration tests pass perfectly.Summary by CodeRabbit
Bug Fixes
Tests