Skip to content

dsaad68/zilense

Repository files navigation

Zilense icon

Zilense · Chinese Dictionary

Hover or select any Chinese on a page → tone-colored pinyin, meaning, stroke order & a clean reader, right in the Chrome side panel.

Available in the Chrome Web Store

zilense.com · Privacy

A Chrome MV3 extension that lives in the side panel: hover any Chinese character on a page (or select a word) and its reading + meaning appear instantly, with tone-colored pinyin, search, stroke order, and a saved deck.

Recreated from a Claude Design prototype as a real extension with Vite + React + CRXJS, backed by CC-CEDICT (~125k entries) — a pinned copy of the raw cedict_ts.u8 export is vendored at assets/cedict/ and parsed at build time (no install-time download), so the dictionary index is fully reproducible — plus CedPane (~74k public-domain names and proper nouns) merged in at build time so people, places, and brands resolve too.

Install

Add Zilense to Chrome from the Chrome Web Store →

For development or to run an unpacked build, see Develop / build and Load in Chrome below.

Develop / build

npm install          # no network beyond the registry; no dictionary download
npm run fetch:fonts  # vendor Google Fonts -> src/sidepanel/fonts/ (one-off; committed)
npm run build:dict   # parse vendored CC-CEDICT -> src/data/cedict.json  (auto-runs on dev/build)
npm run lint         # ESLint (flat config)
npm run dev          # CRXJS dev server with hot reload  ->  dist/
npm run build        # production build  ->  dist/
npm test             # unit tests (Node's runner)
npm run test:e2e     # Playwright smoke test (loads dist/ in Chromium; build first)
npm run refresh:cedict  # (maintenance) pull a newer CC-CEDICT from MDBG into assets/cedict/

Load in Chrome

  1. npm run build (or npm run dev).
  2. Go to chrome://extensions, enable Developer mode.
  3. Load unpacked → select the dist/ folder.
  4. Click the Zilense toolbar icon → a small menu opens with Open side panel, Open in window, a Hover popup toggle, and Disable on this site.

Use

  • Hover a Chinese character on any page → it highlights and loads in the panel. Works on simplified and traditional pages (中國 resolves to 中国).
  • Select a word (or right-click → Look up “…” in Zilense) → whole-word reading + character breakdown. Surrounding punctuation is trimmed, and a phrase that isn’t one entry is segmented into its words.
  • Search by hanzi, pinyin (tone marks optional, e.g. nihao), or English, ranked so common/HSK words beat rare homographs.
  • 🔊 Pronounce plays Mandarin via the browser’s speech synthesis (when a zh-CN voice is installed).
  • ★ Save entries to your deck (persists via chrome.storage).
  • 🪟 Open in window (toolbar menu → Open in window) → the same dictionary in a detached, chromeless popup window that floats free of the side panel and stays put across tab switches. It shares your saved words, history, and settings, and gets the same live hover/selection lookups; clicking again focuses the existing window instead of opening another.
  • 📄 PDFs — open a PDF in Chrome and a small in-page toast offers Open in Zilense; it opens in a bundled PDF.js viewer where hover, lookup, and selection work just like on a web page. Scanned/image-only PDFs are recognized with offline OCR (bundled Tesseract.js + a Simplified-Chinese model — no network), so even photographed workbooks become hoverable.
  • 🎬 Video subtitles — pinyin + dual captions (experimental) → on YouTube (and Coursera), the on-video captions get the same treatment as the page: tone-colored pinyin above the Chinese and clickable words that look up in the panel. Turn on Dual subtitles to show Chinese on top and a second language below (defaults to English, switchable in the toolbar menu); when a video has no human track for the Chinese or the second line, YouTube's auto-captions / auto-translation fill in. The controls (on/off, dual, second language, pinyin) appear in the toolbar menu only on supported video sites, and there's a 字 gear on the player to pick languages per video.
    • ⚠️ Experimental. It relies on each site's player internals and YouTube's caption endpoints, which change often, so expect rough edges. Netflix support is planned for a later release.
  • 🎴 Flashcards (toolbar menu → Flashcards, opens a full-page tab) → study your starred words or any HSK 3.0 level with flip cards and keyboard shortcuts; progress is kept on the device.
    • HSK decks come straight from the official HSK 3.0 word lists — one card per list row, with that row's exact gloss, part of speech, and pinyin, so a word that carries two senses at a level (花 verb "to spend" / noun "flower") becomes two cards and the deck size matches the list.
    • Level scope — for an HSK deck, pick just this level or up to this level (cumulative, every band ≤ the one chosen).
    • Round setup — choose a study pool (all / unseen / recently missed / ever missed), a size, order (random or sequential), and the prompt direction (character → meaning or meaning → character). Optional toggles put pinyin on top of the character prompt and show part of speech on the answer (POS also appears on the front when a word has more than one sense in the deck, so you know which meaning is being asked).
    • Progress is per-device and can be exported / imported as JSON, or reset.
    • Export to Anki turns your starred words into a tab-separated file Anki imports directly (also available from the panel's Saved tab).
  • ⚙ Settings: accent color, Chinese face (sans/serif), pinyin tone colors; plus dark mode.

Note: Chrome only lets the side panel open from a user gesture, so hovering can't auto-open it, open the panel once (toolbar icon menu → Open side panel, or right-click → Look up …), then it updates live as you hover and select. Prefer a free-floating window? Use Open in window instead — same UI, same live lookups, just not docked to the browser edge. The toolbar menu also lets you toggle the inline hover popup and disable hover on the current site (selection and pinning keep working there).

Keyboard shortcuts (rebind at chrome://extensions/shortcuts):

  • Ctrl + Shift + Y (⌘ + Shift + Y on macOS) → open the dictionary window.
  • Ctrl + Shift + E (⌘ + Shift + E on macOS) → open the side panel.

These launch the dictionary directly without going through the toolbar menu — the closest thing to opening it like an app. If a suggested key is already taken, Chrome leaves it unassigned; set your own on the shortcuts page.

Architecture

Area File(s)
Manifest (MV3) manifest.config.js (CRXJS), vite.config.js
Data pipelines assets/scripts/build-dict.mjssrc/data/cedict.json (CC-CEDICT entries + traditional↔simplified maps + merged HSK/POS/char data + merged CedPane names/proper nouns); CedPane is fetched once at build time and cached, committed, as assets/cedpane/cedpane.json so later builds/tests are deterministic and offline. assets/scripts/convert-chars.mjsassets/char-data/char-data.json (radical/components/strokes, from makemeahanzi); assets/scripts/fetch-fonts.mjssrc/sidepanel/fonts/ + fonts.css (vendored Google Fonts). assets/scripts/convert-hsk.mjs (npm run convert:hsk) → assets/hsk-vocab/hsk-data.json (HSK level + POS + official gloss, parsed from the committed .xls lists). All build/data scripts live under assets/.
Dictionary logic src/lib/dict-core.js (pure lookup/search/segment, unit-tested), src/lib/dict.js (loads the index, wraps core), src/lib/pinyin.js, src/lib/storage.js, src/lib/examples.js (Tatoeba)
Side panel UI src/sidepanel/ (App.jsx + components/, panel.css); the same page also serves the Open in window mode — App.jsx reads ?mode=window to draw its own brand header when not docked
Toolbar menu src/popup/ (index.html, popup.js, popup.css); opens the side panel or a detached window (chrome.windows.create, single-instance via chrome.storage.session), toggles hover popup / site-disable, and runs HSK highlight
Flashcards page src/flashcards/ (index.html, flashcards.js deck/round/setup logic + custom deck dropdown, progress.js local per-device progress, flashcards.css); HSK decks are built entirely from the bundled HSK lists (one card per sense, scope = just/​up-to a level), starred decks from saved words; src/lib/anki.js (pure TSV formatter)
On-page lookup src/content/hover-driver.js (shared cursor→highlight/popup/pin driver), src/content/content.js (initializes it + HSK highlight + Reader mode + the native-PDF "Open in Zilense" toast), content.css
Dual subtitles + pinyin src/content/subs/ (loaded on demand by content.js, OFF by default): index.js gate, engine.js runtime, platforms.js adapters (YouTube, Coursera), overlay.js ruby overlay + word lookup, subs-core.js pure tokens/cues/track helpers, yt-hook.js MAIN-world caption-track reader. Settings under mydict.subs, toggled from the toolbar popup. Phase 1 annotates the shown track with pinyin + clickable words; Phase 2 shows two real YouTube tracks at once (same-origin timedtext, no new permission; auto-captions/auto-translation are separate opt-ins)
PDF viewer src/pdfviewer/ (main.js PDF.js render + lazy pages + text-layer selection, reuses the hover driver; ocr.js/ocr-core.js offline Tesseract OCR for scanned pages; target.js URL parsing, pdfviewer.css). OCR engine + chi_sim model bundled under public/tesseract/
Background src/background/service-worker.js (panel open, context menus, hover/segment lookup, PDF redirect rule + open-pdf navigation)
Tests test/*.test.mjs (npm test, Node's built-in runner: dict-core, pinyin, content-core, manifest, storage-helpers, reader-stash, anki, cedpane, familiarity, grammar, hsk-data, meanings, word-family, subs-core, hover-driver + pdf-target + ocr-core for the PDF/OCR logic; DOM logic, reader-extract, word-walk, subs-overlay, subs-engine, via happy-dom); e2e/*.spec.js (npm run test:e2e, Playwright extension tests: panel.spec.js, popup.spec.js, reader.spec.js, flashcards.spec.js, and pdfviewer.spec.js — text-layer render, hover-to-pin, the native-PDF toast, scanned-PDF OCR, and selection)

Known limitations / next steps

  • Stroke order stroke data is fetched on demand from the jsdelivr CDN (the hanzi-writer library itself is bundled). Bundling all ~9.6k stroke files locally would add ~25 MB; deferred. Requires network only when you expand the Stroke order section.

  • Data beyond CC-CEDICT (CC-CEDICT only has simplified/traditional, pinyin, defs, classifiers): HSK 3.0 level + part of speech + official English gloss live in the committed assets/hsk-vocab/hsk-data.json (parsed from the HSK 3.0 .xls word lists by npm run convert:hsk); radical / components / stroke count come from makemeahanzi (npm run convert:chars, re-downloads its source on demand); example sentences come live from Tatoeba. Traditional→simplified maps are built into cedict.json so traditional input resolves. Still missing: word frequency (a level proxy is used in ranking) and bundled audio (pronunciation uses the browser's TTS). The raw CC-CEDICT source (assets/cedict/cedict_ts.u8) is vendored and pinned, so the derived cedict.json — though git-ignored and rebuilt on dev/build/ test — is reproducible offline from committed inputs.

  • Names & proper nouns come from CedPane (public domain), merged into cedict.json at build time and tagged as proper nouns. To keep a name homograph from beating the everyday word, proper nouns are ranked below ordinary/HSK words in search and the word family unless the query is an exact full match (dict-core.js). Adding ~74k name entries grows cedict.json from ~14 MB to ~20 MB (gzip ~7 MB); it is still loaded once, on demand, by the service worker and the panel. CedPane's already-tone-marked pinyin is shown as-is but is not yet indexed for toneless (ASCII) pinyin search, so names are found by characters or English, not by pinyin.

  • Permissions: host_permissions is scoped to the two services the panel fetches from (tatoeba.org, cdn.jsdelivr.net); page injection for hover lookup comes from content_scripts matches, which needs no host permission. minimum_chrome_version is 116 (Side Panel API floor is 114; 116 makes opening the panel from a page-side user gesture reliable). For PDFs: optional_host_permissions: ['*://*/*'] is requested on demand (only the specific PDF's origin, when you choose to open it in the viewer), not granted at install; and content_security_policy.extension_pages adds 'wasm-unsafe-eval' so the bundled Tesseract OCR WebAssembly can run (still 'self' only — no remote code). PDFs are opened manually (in-page toast / right-click), so there is no navigation-redirect permission.

  • PDF viewer & OCR: navigated PDFs open in Chrome's native viewer, which has no hoverable text — so the content script (which does run on the PDF tab's top frame) shows an "Open in Zilense" toast that reopens the PDF in a bundled PDF.js viewer with a real text layer. Image-only/scanned PDFs are OCR'd with bundled Tesseract.js + the chi_sim model (under public/tesseract/, ~9 MB, fully offline); recognition runs per visible page. On OCR'd pages the hover/pin highlight overlays are suppressed (recognized text can't align pixel-perfectly with the image) while the popup and lookup still work. file:// PDFs need the manual "Allow access to file URLs" toggle.

  • Fonts: three typefaces, each with a job, Noto Sans SC for all Chinese glyphs (switchable to Noto Serif SC in Settings), Source Serif 4 (variable, optical-size axis) for English content, and the system UI sans for functional chrome (labels, badges, tabs, pinyin). They are self-hosted: npm run fetch:fonts vendors the woff2 files into src/sidepanel/fonts/ and writes fonts.css (linked from index.html), so the extension carries no remote stylesheet/font dependency (MV3-friendly, works offline). CJK ships as many unicode-range subset files; the browser fetches only the ranges a page uses at runtime. The committed fonts add ~44 MB to the repo and the build emits ~10 MB into dist/ (deduped), so dist/ is ~25 MB, most of the rest is the dictionary JSON.

  • Licensing: the application code is MIT (LICENSE). Bundled and fetched third-party data, fonts, and libraries keep their own licenses, CC-CEDICT (CC BY-SA 4.0, attribution + share-alike), CedPane (public domain / Unlicense), makemeahanzi (Arphic Public License + LGPL), Noto SC / Source Serif 4 (SIL OFL 1.1), Tatoeba (CC BY 2.0 FR), Readability (Apache-2.0), React (MIT). Full attribution is in THIRD-PARTY-NOTICES.md, which is copied into dist/ on build and linked from the side-panel Settings. The privacy policy and project site are published from docs/ via GitHub Pages.

  • On-page hover does greedy longest-match word detection (à la the Zhongwen extension): the content script collects the forward run of characters under the cursor, across adjacent text nodes, so a word split over inline elements (新<span></span>) still matches, and the panel returns the longest matching word and how many characters to highlight (新闻 → the 2-char word, not 新 + 闻).

About

🔍 ZiLens: a Chrome side-panel Chinese dictionary: hover or select any hanzi for pinyin, meaning, stroke order & examples. MV3 · Vite + React + CRXJS · CC-CEDICT.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors