🛡️ Sentinel: [HIGH] Fix RTLO Spoofing in Folder Names#163
🛡️ Sentinel: [HIGH] Fix RTLO Spoofing in Folder Names#163
Conversation
- Block Unicode Bidi control characters (RTLO, etc.) in folder names to prevent spoofing. - Block path separators (/ and \) to prevent potential path confusion. - Update tests/test_folder_validation.py with new test cases. Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
Merging to
|
Summary of ChangesHello @abhimehro, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a high-severity security vulnerability by enhancing folder name validation. It prevents homograph and spoofing attacks that leverage Unicode Bidi control characters and eliminates confusion caused by path separators in folder names. These changes significantly improve the security and integrity of folder naming conventions within the system. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
|
||
| # Case 6: Path Separators | ||
| slash_data = {"group": {"group": "Folder/Name"}} | ||
| assert main.validate_folder_data(slash_data, "http://slash.com") is False |
Check notice
Code scanning / Bandit
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
| assert main.validate_folder_data(slash_data, "http://slash.com") is False | ||
|
|
||
| backslash_data = {"group": {"group": "Folder\\Name"}} | ||
| assert main.validate_folder_data(backslash_data, "http://backslash.com") is False |
Check notice
Code scanning / Bandit
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
| # Case 7: Bidi Control Characters (RTLO) | ||
| # \u202e is Right-To-Left Override | ||
| rtlo_data = {"group": {"group": "SafeName\u202eexe.pdf"}} | ||
| assert main.validate_folder_data(rtlo_data, "http://rtlo.com") is False |
Check notice
Code scanning / Bandit
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
|
|
||
| # Case 8: Other Bidi Char | ||
| lre_data = {"group": {"group": "Name\u202a"}} | ||
| assert main.validate_folder_data(lre_data, "http://lre.com") is False |
Check notice
Code scanning / Bandit
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
|
|
||
| # Case 6: Path Separators | ||
| slash_data = {"group": {"group": "Folder/Name"}} | ||
| assert main.validate_folder_data(slash_data, "http://slash.com") is False |
Check notice
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
| assert main.validate_folder_data(slash_data, "http://slash.com") is False | ||
|
|
||
| backslash_data = {"group": {"group": "Folder\\Name"}} | ||
| assert main.validate_folder_data(backslash_data, "http://backslash.com") is False |
Check notice
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
| # Case 7: Bidi Control Characters (RTLO) | ||
| # \u202e is Right-To-Left Override | ||
| rtlo_data = {"group": {"group": "SafeName\u202eexe.pdf"}} | ||
| assert main.validate_folder_data(rtlo_data, "http://rtlo.com") is False |
Check notice
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
|
|
||
| # Case 8: Other Bidi Char | ||
| lre_data = {"group": {"group": "Name\u202a"}} | ||
| assert main.validate_folder_data(lre_data, "http://lre.com") is False |
Check notice
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. Note test
| @@ -0,0 +1,4 @@ | |||
| ## 2025-02-18 - [RTLO/Bidi Spoofing in Folder Names] | |||
Check notice
Code scanning / Remark-lint (reported by Codacy)
Warn when references to undefined definitions are found. Note
| @@ -0,0 +1,4 @@ | |||
| ## 2025-02-18 - [RTLO/Bidi Spoofing in Folder Names] | |||
Check notice
Code scanning / Remark-lint (reported by Codacy)
Warn when shortcut reference links are used. Note
There was a problem hiding this comment.
Pull request overview
This PR fixes a HIGH severity security vulnerability where folder names could contain Unicode Bidi control characters (particularly RTLO - Right-to-Left Override) and path separators, enabling spoofing attacks where malicious names like "safe\u202eexe.pdf" would render as "safepdf.exe", potentially misleading users about file types.
Changes:
- Enhanced
is_valid_folder_name()validation to explicitly block 9 Unicode Bidi control characters (U+202A through U+202E and U+2066 through U+2069) and path separators (/ and ) - Added comprehensive test cases covering path separators and multiple Bidi control characters
- Documented the vulnerability, learning, and prevention strategy in sentinel.md
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| main.py | Added blocking of path separators and Bidi control characters in folder name validation logic |
| tests/test_folder_validation.py | Added 4 new test cases (Cases 6-8) validating rejection of path separators and Bidi characters |
| .jules/sentinel.md | Created documentation entry describing the RTLO/Bidi spoofing vulnerability and mitigation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| bidi_chars = { | ||
| "\u202a", | ||
| "\u202b", | ||
| "\u202c", | ||
| "\u202d", | ||
| "\u202e", | ||
| "\u2066", | ||
| "\u2067", | ||
| "\u2068", | ||
| "\u2069", | ||
| } |
There was a problem hiding this comment.
The bidi_chars set is recreated on every function call, which is inefficient. Consider moving it to module level as a constant (e.g., BIDI_CONTROL_CHARS) similar to how dangerous_chars is defined inline but could also be a constant. This would improve performance especially if this validation is called frequently.
| # Block Bidi control characters to prevent RTLO spoofing | ||
| # \u202a (LRE), \u202b (RLE), \u202c (PDF), \u202d (LRO), \u202e (RLO) | ||
| # \u2066 (LRI), \u2067 (RLI), \u2068 (FSI), \u2069 (PDI) | ||
| bidi_chars = { | ||
| "\u202a", | ||
| "\u202b", | ||
| "\u202c", | ||
| "\u202d", | ||
| "\u202e", | ||
| "\u2066", | ||
| "\u2067", | ||
| "\u2068", | ||
| "\u2069", | ||
| } |
There was a problem hiding this comment.
Consider also blocking U+200E (Left-to-Right Mark) and U+200F (Right-to-Left Mark) for completeness, as these can also be used for text direction manipulation attacks, though they are less severe than the override/embedding characters. While LRM and RLM are less dangerous than RTLO, blocking them would provide defense-in-depth for folder names where directional marks serve no legitimate purpose.
There was a problem hiding this comment.
Code Review
This pull request addresses a high-severity spoofing vulnerability (RTLO) in folder names by blocking Unicode Bidi control characters and path separators. The changes in main.py correctly implement the validation logic, and the accompanying tests in tests/test_folder_validation.py verify the fix. My review includes suggestions to improve performance in main.py by defining constant sets at the module level, and to enhance the test suite in tests/test_folder_validation.py for better coverage and maintainability. Specifically, I've pointed out that the test coverage for the newly blocked Bidi characters is incomplete, which is critical for a security fix. I've also suggested a small refactoring to reduce code duplication in the tests.
| # Case 7: Bidi Control Characters (RTLO) | ||
| # \u202e is Right-To-Left Override | ||
| rtlo_data = {"group": {"group": "SafeName\u202eexe.pdf"}} | ||
| assert main.validate_folder_data(rtlo_data, "http://rtlo.com") is False | ||
|
|
||
| # Case 8: Other Bidi Char | ||
| lre_data = {"group": {"group": "Name\u202a"}} | ||
| assert main.validate_folder_data(lre_data, "http://lre.com") is False |
There was a problem hiding this comment.
The test coverage for the Bidi control characters is incomplete. The fix in main.py blocks 9 different characters, but the tests only cover 2 of them (\u202e and \u202a). For a security-related change, it's crucial to ensure all blocked characters are explicitly tested to prevent regressions.
| # Case 7: Bidi Control Characters (RTLO) | |
| # \u202e is Right-To-Left Override | |
| rtlo_data = {"group": {"group": "SafeName\u202eexe.pdf"}} | |
| assert main.validate_folder_data(rtlo_data, "http://rtlo.com") is False | |
| # Case 8: Other Bidi Char | |
| lre_data = {"group": {"group": "Name\u202a"}} | |
| assert main.validate_folder_data(lre_data, "http://lre.com") is False | |
| # Case 7: Bidi Control Characters | |
| bidi_chars_to_test = { | |
| "\u202a", "\u202b", "\u202c", "\u202d", "\u202e", | |
| "\u2066", "\u2067", "\u2068", "\u2069", | |
| } | |
| for char in bidi_chars_to_test: | |
| data = {"group": {"group": f"SafeName{char}exploit.exe"}} | |
| assert main.validate_folder_data(data, "http://bidi.com") is False |
| dangerous_chars = set("<>\"'`") | ||
| if any(c in dangerous_chars for c in name): | ||
|
|
||
| # Block path separators to prevent confusion | ||
| dangerous_chars.update(["/", "\\"]) | ||
|
|
||
| # Block Bidi control characters to prevent RTLO spoofing | ||
| # \u202a (LRE), \u202b (RLE), \u202c (PDF), \u202d (LRO), \u202e (RLO) | ||
| # \u2066 (LRI), \u2067 (RLI), \u2068 (FSI), \u2069 (PDI) | ||
| bidi_chars = { | ||
| "\u202a", | ||
| "\u202b", | ||
| "\u202c", | ||
| "\u202d", | ||
| "\u202e", | ||
| "\u2066", | ||
| "\u2067", | ||
| "\u2068", | ||
| "\u2069", | ||
| } |
There was a problem hiding this comment.
For performance, the dangerous_chars and bidi_chars sets are being recreated on every call to is_valid_folder_name. Since this function can be called frequently, it's more efficient to define these as module-level constants to avoid the overhead of set creation on each invocation.
For example, you could define them at the top of the file and reference them in the function:
# At module level
_DANGEROUS_FOLDER_CHARS = set("<>\"'`/\\")
_BIDI_CHARS = {
"\u202a", "\u202b", "\u202c", "\u202d", "\u202e",
"\u2066", "\u2067", "\u2068", "\u2069",
}
def is_valid_folder_name(name: str) -> bool:
# ... (initial checks)
if any(c in _DANGEROUS_FOLDER_CHARS or c in _BIDI_CHARS for c in name):
return False
return True| slash_data = {"group": {"group": "Folder/Name"}} | ||
| assert main.validate_folder_data(slash_data, "http://slash.com") is False | ||
|
|
||
| backslash_data = {"group": {"group": "Folder\\Name"}} | ||
| assert main.validate_folder_data(backslash_data, "http://backslash.com") is False |
There was a problem hiding this comment.
These two test cases for path separators are very similar. To reduce code duplication and improve maintainability, you can combine them into a single loop.
| slash_data = {"group": {"group": "Folder/Name"}} | |
| assert main.validate_folder_data(slash_data, "http://slash.com") is False | |
| backslash_data = {"group": {"group": "Folder\\Name"}} | |
| assert main.validate_folder_data(backslash_data, "http://backslash.com") is False | |
| # Case 6: Path Separators | |
| for separator in ("/", "\\"): | |
| data = {"group": {"group": f"Folder{separator}Name"}} | |
| assert main.validate_folder_data(data, "http://separator.com") is False |
Security: Block Unicode Bidi control characters to prevent homograph attacks - Block 11 Bidi control characters (LRE, RLE, PDF, LRO, RLO, LRI, RLI, FSI, PDI, LRM, RLM) - Block path separators (/ and \) to prevent confusion - Move dangerous character sets to module-level constants for performance - Add comprehensive test coverage for all blocked characters - Test characters in different positions (start, middle, end) - Add security documentation explaining the vulnerability Addresses feedback from PR #163 Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
Security: Block Unicode Bidi control characters to prevent homograph attacks - Block 11 Bidi control characters (LRE, RLE, PDF, LRO, RLO, LRI, RLI, FSI, PDI, LRM, RLM) - Block path separators (/ and \) to prevent confusion - Move dangerous character sets to module-level constants for performance - Add comprehensive test coverage for all blocked characters - Test characters in different positions (start, middle, end) - Add security documentation explaining the vulnerability Addresses feedback from PR #163 Co-authored-by: abhimehro <84992105+abhimehro@users.noreply.github.com>
🚨 Severity: HIGH
💡 Vulnerability: Folder names allowed Unicode Bidi control characters (e.g., RTLO), enabling Homograph/Spoofing attacks where a name like "safe\u202eexe.pdf" renders as "safepdf.exe". Also allowed path separators / and \ which could cause confusion.
🔧 Fix: Enhanced is_valid_folder_name to explicitly block Bidi control characters and path separators.
✅ Verification: Added new test cases in tests/test_folder_validation.py which pass.
PR created automatically by Jules for task 15718768312849110796 started by @abhimehro