replace bleach with nh3 for HTML sanitization#14442
Open
valentijnscholten wants to merge 10 commits into
Open
Conversation
bleach is deprecated and archived. nh3 is its Rust-backed successor, actively maintained and significantly faster.
…link - Use escape() when building HTML in create_bleached_link so attribute values are properly encoded before nh3 parses them (prevents raw tags in href/title when user-supplied content contains HTML) - Add rel="noopener noreferrer" to all expected link strings in tests (nh3 automatically injects this on target="_blank" links) - Replace exact-output XSS assertion with semantic safety checks
nh3/ammonia does not re-escape < in attribute values when re-serializing, so passing escape()'d HTML through nh3.clean() still produced raw angle brackets in href/title. The function constructs trusted HTML itself, so nh3 is redundant here — escape() is sufficient and correct. Also adds rel="noopener noreferrer" explicitly and updates tests to match the new output including the exact XSS-escaped form.
Maffooch
requested changes
Mar 11, 2026
Maffooch
left a comment
Contributor
There was a problem hiding this comment.
Just a question - I'm not dying on this hill 😄
| The `bleach` library has been replaced by [`nh3`](https://nh3.readthedocs.io/) for HTML sanitization. This is a drop-in replacement in most cases, but there are two minor behavioral changes to be aware of: | ||
|
|
||
| - **`style` attributes are no longer allowed.** `bleach` supported CSS property-level filtering (e.g. allowing only `color` or `font-weight`). `nh3` has no equivalent, so `style` attributes are stripped entirely to avoid allowing arbitrary CSS injection. Content that previously relied on inline styles (e.g. colored text in the login banner, background-color on markdown images) will lose that styling. | ||
| - **Disallowed tags are stripped rather than escaped.** Previously, a tag like `<script>` would be rendered as the literal text `<script>`. Now the tag is removed entirely and only its text content is kept. This is the correct behavior for a sanitizer. |
Contributor
There was a problem hiding this comment.
Since this is a security tool, wont this render XSS findings stored in dojo totally incomplete? It wont be be clear what the original payload is if we strip out the tags entirely. It feels better to URL encode them for our specific use case
Contributor
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Contributor
|
Conflicts have been resolved. A maintainer will review the pull request shortly. |
Contributor
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Resolved conflict in requirements.txt: kept nh3 (replacing bleach) and updated celery[sqs] to 5.6.3 from upstream. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
|
Conflicts have been resolved. A maintainer will review the pull request shortly. |
Missed file in initial bleach→nh3 migration. Maps bleach.clean() params to nh3.clean() equivalents: protocols→url_schemes, lists→sets. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… nh3 - Replace bleach with nh3 in dojo/announcement/os_message.py (was causing ModuleNotFoundError in 132 rest-framework tests) - Update sonarqube importer test assertions to match nh3 output, which adds rel="noopener noreferrer" to all links Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
nh3 adds rel="noopener noreferrer" to links; update assertion to match. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bleach has been deprecated since Feb '23. Although it still works, a security product like Defect Dojo needs to switch to the goto replacement:
nh3which is also faster.Summary
bleachlibrary withnh3, its actively maintained, Rust-backed successorbleach[css]fromrequirements.txtand addsnh3dojo/utils.py,dojo/templatetags/display_tags.py, anddojo/templatetags/get_banner.pybleach.ALLOWED_TAGS/bleach.ALLOWED_ATTRIBUTESwith explicit constants (_NH3_ALLOWED_TAGS,_NH3_ALLOWED_ATTRIBUTES) shared between the two template tag modulesNote on
styleattribute:bleachsupported CSS property-level filtering viaCSSSanitizer(e.g. allowing onlycolorandfont-weight).nh3has no equivalent — it cannot filter individual CSS properties, only entire attributes. To avoid allowing arbitrary CSS injection, thestyleattribute is no longer permitted on any element. In practice this means inline styling (e.g. colored text in the login banner, background-color on images in markdown) will be stripped rather than sanitized. This is the safer trade-off.Note on disallowed tag handling:
bleachescaped disallowed tags (e.g.<script>→<script>), making them visible as literal text.nh3strips them entirely, keeping only the text content. This is the correct behavior for a sanitizer.