Skip to content

replace bleach with nh3 for HTML sanitization#14442

Open
valentijnscholten wants to merge 10 commits into
DefectDojo:devfrom
valentijnscholten:bleach-to-nh3
Open

replace bleach with nh3 for HTML sanitization#14442
valentijnscholten wants to merge 10 commits into
DefectDojo:devfrom
valentijnscholten:bleach-to-nh3

Conversation

@valentijnscholten

@valentijnscholten valentijnscholten commented Mar 4, 2026

Copy link
Copy Markdown
Member

Bleach has been deprecated since Feb '23. Although it still works, a security product like Defect Dojo needs to switch to the goto replacement: nh3 which is also faster.

Summary

  • Replaces the deprecated (archived) bleach library with nh3, its actively maintained, Rust-backed successor
  • Drops bleach[css] from requirements.txt and adds nh3
  • Updates all call sites in dojo/utils.py, dojo/templatetags/display_tags.py, and dojo/templatetags/get_banner.py
  • Replaces bleach.ALLOWED_TAGS / bleach.ALLOWED_ATTRIBUTES with explicit constants (_NH3_ALLOWED_TAGS, _NH3_ALLOWED_ATTRIBUTES) shared between the two template tag modules

Note on style attribute: bleach supported CSS property-level filtering via CSSSanitizer (e.g. allowing only color and font-weight). nh3 has no equivalent — it cannot filter individual CSS properties, only entire attributes. To avoid allowing arbitrary CSS injection, the style attribute is no longer permitted on any element. In practice this means inline styling (e.g. colored text in the login banner, background-color on images in markdown) will be stripped rather than sanitized. This is the safer trade-off.

Note on disallowed tag handling: bleach escaped disallowed tags (e.g. <script>&lt;script&gt;), making them visible as literal text. nh3 strips them entirely, keeping only the text content. This is the correct behavior for a sanitizer.

bleach is deprecated and archived. nh3 is its Rust-backed successor,
actively maintained and significantly faster.
@github-actions github-actions Bot added the ui label Mar 4, 2026
@valentijnscholten valentijnscholten added the affects_pro PRs that affect Pro and need a coordinated release/merge moment. label Mar 4, 2026
@valentijnscholten valentijnscholten changed the title chore(deps): replace bleach with nh3 for HTML sanitization replace bleach with nh3 for HTML sanitization Mar 4, 2026
@github-actions github-actions Bot added the docs label Mar 4, 2026
…link

- Use escape() when building HTML in create_bleached_link so attribute
  values are properly encoded before nh3 parses them (prevents raw tags
  in href/title when user-supplied content contains HTML)
- Add rel="noopener noreferrer" to all expected link strings in tests
  (nh3 automatically injects this on target="_blank" links)
- Replace exact-output XSS assertion with semantic safety checks
nh3/ammonia does not re-escape < in attribute values when re-serializing,
so passing escape()'d HTML through nh3.clean() still produced raw angle
brackets in href/title. The function constructs trusted HTML itself, so
nh3 is redundant here — escape() is sufficient and correct.

Also adds rel="noopener noreferrer" explicitly and updates tests to
match the new output including the exact XSS-escaped form.
@valentijnscholten valentijnscholten added this to the 2.57.0 milestone Mar 6, 2026

@mtesauro mtesauro left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

@Maffooch Maffooch left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question - I'm not dying on this hill 😄

The `bleach` library has been replaced by [`nh3`](https://nh3.readthedocs.io/) for HTML sanitization. This is a drop-in replacement in most cases, but there are two minor behavioral changes to be aware of:

- **`style` attributes are no longer allowed.** `bleach` supported CSS property-level filtering (e.g. allowing only `color` or `font-weight`). `nh3` has no equivalent, so `style` attributes are stripped entirely to avoid allowing arbitrary CSS injection. Content that previously relied on inline styles (e.g. colored text in the login banner, background-color on markdown images) will lose that styling.
- **Disallowed tags are stripped rather than escaped.** Previously, a tag like `<script>` would be rendered as the literal text `&lt;script&gt;`. Now the tag is removed entirely and only its text content is kept. This is the correct behavior for a sanitizer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a security tool, wont this render XSS findings stored in dojo totally incomplete? It wont be be clear what the original payload is if we strip out the tags entirely. It feels better to URL encode them for our specific use case

@github-actions

Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions

Copy link
Copy Markdown
Contributor

Conflicts have been resolved. A maintainer will review the pull request shortly.

@github-actions

github-actions Bot commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@Maffooch Maffooch modified the milestones: 2.57.0, 2.58.0 Apr 3, 2026
@Maffooch Maffooch removed this from the 2.58.0 milestone May 1, 2026
Resolved conflict in requirements.txt: kept nh3 (replacing bleach) and
updated celery[sqs] to 5.6.3 from upstream.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

Conflicts have been resolved. A maintainer will review the pull request shortly.

Missed file in initial bleach→nh3 migration. Maps bleach.clean() params
to nh3.clean() equivalents: protocols→url_schemes, lists→sets.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
valentijnscholten and others added 2 commits June 16, 2026 20:22
… nh3

- Replace bleach with nh3 in dojo/announcement/os_message.py (was
  causing ModuleNotFoundError in 132 rest-framework tests)
- Update sonarqube importer test assertions to match nh3 output, which
  adds rel="noopener noreferrer" to all links

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
nh3 adds rel="noopener noreferrer" to links; update assertion to match.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@mtesauro mtesauro added this to the 3.1.0 milestone Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

affects_pro PRs that affect Pro and need a coordinated release/merge moment. docs integration_tests parser ui unittests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants