Skip to content

fix: resolve prompt injection detector homoglyph bypass (#102)#119

Open
MayurKharat0390 wants to merge 1 commit into
sreerevanth:mainfrom
MayurKharat0390:fix/homoglyph-bypass
Open

fix: resolve prompt injection detector homoglyph bypass (#102)#119
MayurKharat0390 wants to merge 1 commit into
sreerevanth:mainfrom
MayurKharat0390:fix/homoglyph-bypass

Conversation

@MayurKharat0390
Copy link
Copy Markdown
Contributor

@MayurKharat0390 MayurKharat0390 commented Jun 2, 2026

Problem

The scan_text function in agentwatch/core/injection.py relies on unicodedata.normalize("NFKC", text) under the assumption that it translates visual homoglyphs (such as Cyrillic or Greek lookalikes) to their canonical ASCII equivalents before the regex patterns are evaluated.

However, Unicode NFKC normalization never translates visual confusables across different scripts (e.g. mapping Cyrillic small letter о (U+043E) to Latin small letter o (U+006F)).

Since the injection detector's patterns are Latin ASCII-only, an attacker can bypass all prompt injection signatures by replacing Latin letters with visually identical Greek or Cyrillic homoglyphs.

Solution

  • Introduced a translation map _HOMOGLYPH_MAP inside agentwatch/core/injection.py that maps visually identical/confusable Greek, Cyrillic, and Latin Extended characters to standard Latin ASCII.
  • Updated _normalize(text) to map these lookalike characters to their Latin counterparts after NFKC normalization.

Testing & Verification

  • Added a dedicated unit test test_injection_detector_homoglyph_bypass_prevention in tests/test_safety.py to assert that visual lookalike payloads spelling out "ignore previous instructions", "reveal your prompt", and "new instructions:" are successfully blocked.
  • All formatting check rules passed via ruff and all 265 unit/integration tests pass perfectly.

Summary by CodeRabbit

  • Bug Fixes

    • Improved prompt-injection detection to catch attempts that use Unicode lookalike characters and visual homoglyphs from multiple scripts, reducing bypass risk.
  • Tests

    • Added automated tests validating detection of injection attempts built with homoglyph and Unicode lookalike substitution patterns to ensure continued robustness.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 558823c5-22f3-4369-afda-24aa1aa93c57

📥 Commits

Reviewing files that changed from the base of the PR and between 0ec813f and 1c84594.

📒 Files selected for processing (2)
  • agentwatch/core/injection.py
  • tests/test_safety.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_safety.py

📝 Walkthrough

Walkthrough

This PR strengthens the prompt-injection detector by adding homoglyph-to-ASCII translation to its text normalization pipeline. A static character mapping translates visual lookalikes (Cyrillic/Greek/Latin variants) to ASCII equivalents, and the _normalize() function applies this after NFKC normalization. A new test validates detection of multi-homoglyph injection payloads.

Changes

Homoglyph normalization for injection detection

Layer / File(s) Summary
Homoglyph mapping and normalization logic
agentwatch/core/injection.py
_HOMOGLYPH_MAP (lines 30–92) maps Cyrillic, Greek, and Latin visual lookalikes to ASCII; _normalize() (lines 96–99) applies NFKC normalization then character replacement to neutralize lookalike substitutions.
Homoglyph bypass detection test
tests/test_safety.py
test_injection_detector_homoglyph_bypass_prevention() (lines 484–507) verifies scan_text detects injection payloads using multiple homoglyph patterns (Cyrillic а, Greek α, dotless-i) and allows benign instructions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

  • sreerevanth/AgentWatch#102: Directly addresses homoglyph-based injection bypass by implementing explicit visual-lookalike detection and mapping in the injection detector.

Possibly related PRs

  • sreerevanth/AgentWatch#66: Both PRs modify the injection detector's text normalization pipeline; PR #66 introduced NFKC normalization and bidi-control handling, and this PR adds homoglyph mapping as a downstream normalization step.

Suggested labels

security, bug, level: advanced, level3

Poem

🐰 I hop through text both near and far,
spotting twins that look like "a" or "α",
I map their masks back to plain ASCII,
chase sly homoglyphs out of the sky,
and nibble bad inputs—safe as pie.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately summarizes the main change: fixing a prompt injection detector bypass vulnerability caused by homoglyph substitution attacks.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agentwatch/core/injection.py`:
- Around line 30-92: _HOMOGLYPH_MAP is missing the Cyrillic small letter "м"
(U+043C), allowing strings like "proмpt" to bypass normalization; add the
mapping "\u043c": "m" to the _HOMOGLYPH_MAP dictionary so lowercase Cyrillic м
is normalized to Latin 'm' (update the existing _HOMOGLYPH_MAP definition in
agentwatch/core/injection.py).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 613daebc-03ec-435f-b1be-7b76b4a6f779

📥 Commits

Reviewing files that changed from the base of the PR and between 46cd2fa and 0ec813f.

📒 Files selected for processing (2)
  • agentwatch/core/injection.py
  • tests/test_safety.py

Comment thread agentwatch/core/injection.py
@sreerevanth
Copy link
Copy Markdown
Owner

@MayurKharat0390 Thanks for the contribution — this addresses a real security gap and the added regression test is appreciated.

Before merge, could you please take a look at the remaining CodeRabbit finding regarding Cyrillic lowercase м (U+043C)? Since this PR is specifically focused on homoglyph bypass prevention, I'd like to make sure we aren't leaving an obvious bypass path uncovered.

Once that's addressed, this should be ready for merge. 🚀

@MayurKharat0390
Copy link
Copy Markdown
Contributor Author

Hey @sreerevanth!

I've addressed the CodeRabbit review finding by adding the mapping for Cyrillic lowercase м (\u043c -> m) to _HOMOGLYPH_MAP inside agentwatch/core/injection.py.

I also added a corresponding regression test case (reveal your proмpt) under test_injection_detector_homoglyph_bypass_prevention to verify that visual homoglyph bypasses using small м are correctly normalized and detected.

All formatting rules and unit tests pass with complete success. It's ready for another look! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants