Skip to content

Code review: CSP/XSS hardening, init race fix, i18n content extraction#23

Merged
BartM82 merged 3 commits into
mainfrom
review/code-review-fixes
May 14, 2026
Merged

Code review: CSP/XSS hardening, init race fix, i18n content extraction#23
BartM82 merged 3 commits into
mainfrom
review/code-review-fixes

Conversation

@sebdraven
Copy link
Copy Markdown
Member

Summary

Code review pass on the DIMA plugin. Five files touched, no behavior change in the happy path — fixes target robustness, CSP compatibility on host pages, and the fact that cleanText was silently stripping the very scripts (Cyrillic, Chinese, Arabic) the plugin is built to analyse.

UI / Security

  • CSP-safe modal (modules/uiManager.js): the inline onclick= / onerror= handlers were blocked by strict CSPs (GitHub, Twitter, …) — the "Fermer" button could become inert on those sites. Replaced by addEventListener bound to explicit IDs.
  • XSS defense in depth: added escapeHtml() and used it for every technique.{nom,index,description,tactic,phase,confidence}, matchedKeywords[].keyword, logoUrl and page-derived value injected through innerHTML. The technique DB is project-controlled but the previous template-literal interpolation made any future < in a description an XSS.
  • DOM-built button: createButton now assembles the floating badge from DOM nodes rather than innerHTML, and applies sanitizeHexColor to riskColor so an invalid hex can't produce a NaN gradient.
  • buttonCreated re-creation bug: removed the early-return that prevented re-creation after the button had been deleted (first call worked, all later calls left an empty corner).

Init / race

  • @keyframes declared globally (content.js): fadeIn, slideIn, slideInRight are now defined once at document-level. They were previously inside the modal's inline <style>, so the suspicious-site alert and details modal — which can open before the analysis modal — animated with no keyframes available.
  • checkDependencies waits for the suspicious-sites manager: that module initialises via a 100 ms setTimeout, so on the very first page the suspicious-site lookup was silently falling back to {isSuspicious:false}. content.js now waits for window.checkSuspiciousSite before booting the analyser.

Content extraction

  • cleanText keeps non-Latin scripts (modules/contentExtractor.js): the old regex used \w (ASCII) and stripped Cyrillic / Chinese / Arabic content. Switched to Unicode property escapes (\p{L}\p{N}\p{M}) — the plugin targets Russian/Chinese disinformation networks, so this was a functional gap, not a nit.
  • detectPageType tightened: now parses the URL and uses word boundaries (businessnews.com is no longer "news") with stems where useful (products still counts as commerce, actualités still as news, social sites matched by hostname rather than substring).

Noise / housekeeping

  • Suspicioussitesmanager.js logs gated behind window.DIMA_DEBUG. The extension runs on <all_urls> and was printing ~20 lines into every host page's console.
  • Typo fix in data/keywords.js: (?:cette|cette|this)(?:cette|this) in the TE0153 pattern.

Test plan

  • node --check on all modified files
  • VM smoke test loading every module — ContentExtractor.cleanText('Hello мир 你好 العالم <script>') returns Hello мир 你好 العالم script (Latin/non-Latin preserved, HTML chars stripped)
  • detectPageType table-test across 14 URL shapes (news/blog/social/commerce/general edge cases) — all pass
  • Load the unpacked extension in Chrome and:
    • Visit a regular news site — confirm score badge appears, modal opens, "Fermer" works, no console spam
    • Visit a site present in one of the DBs — confirm the suspicious-site alert appears with animation on first load
    • Visit a CSP-strict site (e.g. https://github.com) — confirm modal buttons still respond
    • Visit a non-Latin-script site — confirm keywords get matched against the actual content

🤖 Generated with Claude Code

Modal: replace inline onclick/onerror by addEventListener, escape all
technique/page-derived strings via escapeHtml() before innerHTML, harden
the gradient color with sanitizeHexColor().

UI button: drop the buttonCreated early-return that prevented re-creation
after deletion; build the button via DOM nodes instead of innerHTML.

content.js: declare all @Keyframes (fadeIn, slideIn, slideInRight) at
document-level so the suspicious-site alert and details modal get their
animation. Wait for suspiciousSitesManager in checkDependencies so the
suspicious-site check is never missed on first page load.

contentExtractor: switch cleanText to Unicode property escapes
(\p{L}\p{N}\p{M}) — the previous ASCII \w stripped Cyrillic / Chinese /
Arabic content, which is exactly what the plugin is meant to analyse.
Tighten detectPageType with word/stem boundaries and parse the URL so
"businessnews.com" no longer triggers the "news" branch.

Suspicioussitesmanager: gate all the verbose console.log calls behind a
window.DIMA_DEBUG flag — the extension runs on <all_urls> and was
spamming every host page's console.

keywords: drop the duplicated alternative `(cette|cette|this)` in the
TE0153 pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the extension’s UI against CSP/XSS issues, improves initialization robustness (suspicious-sites race + shared animations), and fixes content extraction so non‑Latin scripts are preserved for analysis.

Changes:

  • Reworked UI rendering to be more CSP-safe (no inline handlers) and added HTML escaping for analysis-derived values.
  • Improved init robustness by defining shared keyframes at document scope and waiting for the suspicious-sites API before booting.
  • Updated content extraction/page-type detection and fixed a minor keyword-regex typo.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
modules/uiManager.js CSP-safe event binding, DOM-built button, and escapeHtml() usage for modal content.
modules/Suspicioussitesmanager.js Gates verbose logs behind window.DIMA_DEBUG.
modules/contentExtractor.js Preserves non-Latin scripts in cleanText() and tightens detectPageType() classification.
data/keywords.js Fixes a duplicated alternation token in a regex pattern.
content.js Defines animation keyframes globally and makes init wait for checkSuspiciousSite.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread modules/uiManager.js
Comment on lines +813 to +815
<div style="background: white; padding: 16px; border-radius: 10px; border-left: 4px solid #e67e22; box-shadow: 0 2px 8px rgba(0,0,0,0.08);">
<div style="display: flex; justify-content: between; align-items: start; margin-bottom: 8px;">
<div style="flex: 1;">
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bonne pioche, merci. justify-content: between est en effet rejeté → remplacé par space-between aux deux occurrences (uiManager.js:814,827). J'en ai profité pour normaliser align-items: startflex-start partout (les keywords nus du box-alignment module sont moins largement supportés). Corrigé dans 80c09a2.

Comment thread modules/contentExtractor.js Outdated
Comment on lines +266 to +270
matchWord(pathAndQuery, "article") ||
matchStem(pathAndQuery, "actualit")
) return "news";

if (matchWord(hostname, "blog") || matchWord(pathAndQuery, "blog")) return "blog";
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellente remarque. matchWord("article") exige une frontière non-alphanumérique en fin → /articles/ ne matchait pas. Idem pour /blogs/. J'ai basculé article et blog (et le blog du hostname) sur matchStem dans 80c09a2. Smoke test:\n\n\n/articles/foo => news\n/article/foo => news\n/blogs/post => blog\n/blog/post => blog\nblogs.example.com => blog\nbusinessnews.example.com => general (non-régression)\n

Comment thread content.js Outdated
Comment on lines +114 to +122
@keyframes fadeIn {
from { opacity: 0; }
to { opacity: 1; }
}
@keyframes slideIn {
from { transform: translateY(30px); opacity: 0; }
to { transform: translateY(0); opacity: 1; }
}
@keyframes slideInRight {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Très juste, c'est exactement le genre de collision discrète qui aurait été pénible à diagnostiquer. Les quatre keyframes globaux sont préfixés dima* dans 80c09a2:\n\n- fadeIndimaFadeIn\n- slideIndimaSlideIn\n- slideInRightdimaSlideInRight\n- L'ancien dimaFadeIn (zoom du bouton flottant) → dimaFadeInScale pour libérer le nom court au profit du modal.\n\nTous les animation: ... dans uiManager.js sont mis à jour (5 occurrences).

Three issues raised by the Copilot review of PR #23.

uiManager — invalid CSS values
  `justify-content: between` is not a valid value (CSS ignores the
  declaration). Two occurrences in the techniques list of showModal()
  changed to `space-between`. Also normalised every `align-items: start`
  to the canonical `flex-start` for broader browser support — the bare
  `start` keyword is from the newer box-alignment module and is
  inconsistently supported.

contentExtractor — matchStem for article/blog
  detectPageType used matchWord("article") and matchWord("blog"), which
  required a non-alphanumeric boundary after the token. That broke for
  /articles/foo and /blogs/foo — very common URL segments — and
  silently classed those pages as "general", shifting their contextual
  technique weights. Switched both to matchStem (also for the `blog`
  hostname token, e.g. blogs.example.com).

content.js + uiManager — prefix shared keyframes
  fadeIn / slideIn / slideInRight live in the document-wide @Keyframes
  namespace. On a host page that defines its own animations with the
  same names, ours would clobber theirs (or vice-versa) and produce
  visible regressions outside the extension. Renamed to dimaFadeIn /
  dimaSlideIn / dimaSlideInRight (and the existing badge animation
  dimaFadeIn -> dimaFadeInScale to free the simpler name for the modal
  fade). Updated every `animation: ...` reference in uiManager.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment thread modules/uiManager.js
Comment on lines 53 to 54
const button = document.createElement('div');
button.id = 'dima-btn';
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bien vu. Le <div> est gardé (pour ne pas avoir à neutraliser les styles UA d'un <button> natif) mais doté d'une vraie sémantique de bouton dans 63a7651:

  • role="button" + tabindex="0" → focusable + reconnu par les lecteurs d'écran
  • aria-label qui annonce le score, le niveau et l'action (Activer pour ouvrir le rapport détaillé)
  • handler keydown qui matche Enter, (Space) et Spacebar (legacy) avec preventDefault sur Space pour éviter que la page hôte ne scrolle
  • handlers focus/blur qui dessinent un outline blanc (le user-agent ne stylise pas les div[role=button] par défaut)
  • le 🧠 décoratif passe en aria-hidden="true" pour ne pas être verbalisé deux fois

Couvert par 6 tests dans tests/accessibility.test.js côté PR #24.

Comment thread content.js
Comment on lines 148 to +152
window.CONTEXT_PATTERNS &&
window.ContentExtractor &&
window.TechniqueAnalyzer &&
window.UIManager
window.UIManager &&
typeof window.checkSuspiciousSite === "function"
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Touché, et c'est la cause racine plus que le symptôme. Le setTimeout(100) dans Suspicioussitesmanager.js était lui-même inutile : en MV3 avec run_at: document_end et un seul bloc content_scripts listé en ordre dans manifest.json, les fichiers s'exécutent en séquence — à l'exécution de ce module les bases (copycopDomains, …) sont déjà déclarées.

Dans 63a7651 :

  1. Suspicioussitesmanager.js : suppression du setTimeout(100). window.checkSuspiciousSite est publié de manière synchrone, donc content.js ne touche jamais la boucle de retry en fonctionnement normal.
  2. content.js : les trois console.log info-level (Script chargé, message de retry, Analyseur initialisé) passent derrière window.DIMA_DEBUG. Les console.error (échec de dépendances, exception d'init) restent inconditionnels.

Verrouillé par 2 tests dans tests/syncInit.test.js côté PR #24 : pas de setTimeout( dans le bootstrap, et window.checkSuspiciousSite est une fonction immédiatement après le chargement du module.

sebdraven added a commit that referenced this pull request May 14, 2026
contentExtractor.test.js — add three cases that exercise the matchStem
switch on `article` and `blog`:
  - /articles/foo -> news
  - /blogs/post   -> blog
  - blogs.example.com -> blog

animations.test.js — new file. Scans content.js for @Keyframes
declarations and uiManager.js for `animation:` references, asserting
that every name begins with `dima`. Prevents a future change from
re-introducing a generic keyframe name (fadeIn, slideIn, ...) that
would collide with the host page's CSS namespace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
content.js / Suspicioussitesmanager.js — kill the artificial init delay
  The previous fix expanded checkDependencies() to wait for
  window.checkSuspiciousSite, but that symbol was published inside a
  setTimeout(100ms) callback. Net effect: every page hit the retry loop
  for at least one tick and emitted a "Attente du chargement…" log,
  which the extension should not be doing on <all_urls>.

  Root cause was the setTimeout itself: in MV3 with `run_at:
  document_end` and a single content_scripts entry, JS files run in
  manifest order. By the time Suspicioussitesmanager.js executes, every
  database global is already defined — the 100 ms delay was just
  inherited from an older loading model. Removed; the manager now
  initialises synchronously, content.js's first checkDependencies()
  call passes, and the retry loop is back to being a pure safety net.

  Also gated the two remaining info-level console.log calls in
  content.js ("Script chargé", "Analyseur initialisé", retry message)
  behind window.DIMA_DEBUG. Errors stay visible unconditionally.

uiManager.js — keyboard-accessible floating badge
  The score badge was a <div> with a click handler — not focusable,
  not Enter/Space-activable, invisible to screen readers. Added
  role="button", tabindex="0", aria-label that reads the score and
  level and explains what activation does. Wired a keydown handler
  that responds to Enter and Space (preventDefault on Space to avoid
  scrolling the host page) and visible focus/blur outline so keyboard
  users see the focus ring (UA does not style div[role=button]). The
  brain emoji gets aria-hidden so screen readers don't speak it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sebdraven added a commit that referenced this pull request May 14, 2026
contentExtractor.test.js — add three cases that exercise the matchStem
switch on `article` and `blog`:
  - /articles/foo -> news
  - /blogs/post   -> blog
  - blogs.example.com -> blog

animations.test.js — new file. Scans content.js for @Keyframes
declarations and uiManager.js for `animation:` references, asserting
that every name begins with `dima`. Prevents a future change from
re-introducing a generic keyframe name (fadeIn, slideIn, ...) that
would collide with the host page's CSS namespace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sebdraven sebdraven requested a review from Copilot May 14, 2026 15:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

@BartM82 BartM82 merged commit e362502 into main May 14, 2026
1 check passed
sebdraven added a commit that referenced this pull request May 14, 2026
contentExtractor.test.js — add three cases that exercise the matchStem
switch on `article` and `blog`:
  - /articles/foo -> news
  - /blogs/post   -> blog
  - blogs.example.com -> blog

animations.test.js — new file. Scans content.js for @Keyframes
declarations and uiManager.js for `animation:` references, asserting
that every name begins with `dima`. Prevents a future change from
re-introducing a generic keyframe name (fadeIn, slideIn, ...) that
would collide with the host page's CSS namespace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants