Textagent
diff --git a/‎README.md‎
Lines changed: 2 additions & 1 deletion b/‎README.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎ai-worker-docling.js‎
Lines changed: 243 additions & 0 deletions b/‎ai-worker-docling.js‎
Lines changed: 243 additions & 0 deletions
@@ -47,7 +47,7 @@
 | **💾 Disk Workspace** | Folder-backed storage via File System Access API — "Open Folder" in sidebar header; `.md` files read/written directly to disk; `.textagent/workspace.json` manifest; debounced autosave ("💾 Saved to disk" indicator); refresh from disk for external edits; disconnect to revert to localStorage; auto-reconnect on reload via IndexedDB handles; unified action modal for rename/duplicate/delete with confirmation; Chromium-only (hidden in unsupported browsers) |
 | **📈 Finance Dashboard** | Stock/crypto/index dashboard templates with live TradingView charts; dynamic grid via `data-var-prefix` (add/remove tickers in `@variables` table, grid auto-adjusts); configurable chart range (`1M`, `12M`, `36M`), interval (`D`, `W`, `M`), and EMA period (default 52); interactive 1M/1Y/3Y range + 52D/52W/52M EMA toggle buttons; `@variables` table persists after ⚡ Vars for re-editing; JS code block generates grid HTML from variables |
 | **Extras** | Auto-save (localStorage + cloud), table of contents, image paste, 106+ templates (12 categories: AI, Agents, Coding, Creative, Documentation, Finance, Maths, PPT, Project, Quiz, Tables, Technical), template variable substitution (`$(varName)` with auto-detect), table spreadsheet tools (sort, filter, stats, chart, add row/col, inline cell edit, CSV/MD export), content statistics, modular codebase (13+ JS modules), fully responsive mobile UI with scrollable Quick Action Bar (Files, Search, TOC, Share, Copy, Tools, AI, Model, Upload, Help) and formatting toolbar, multi-file workspace sidebar, compact header mode with collapsible Tools dropdown (Presentation, Zen, Word Wrap, Focus, Voice, Dark Mode, Preview Theme), Clear All / Clear Selection buttons (undoable via Ctrl+Z) |
-| **Dev Tooling** | ESLint + Prettier (lint, format:check), Playwright test suite — 191 tests across smoke, feature, integration, dev, performance, and QA categories (import, export, share, view-mode, editor, email-to-self, secure share, startup timing, export integrity, persistence, module loading, disk workspace, context memory, exec engine, build validation, load-time, accessibility), pre-commit changelog enforcement, GitHub Actions CI |
+| **Dev Tooling** | ESLint + Prettier (lint, format:check), Playwright test suite — 299 tests across smoke, feature, integration, dev, regression, performance, quality, and security categories (import, export, share, view-mode, editor, email-to-self, secure share, startup timing, export integrity, persistence, module loading, disk workspace, context memory, exec engine, build validation, load-time, accessibility, video player, TTS, STT, file converters, stock widget, embed grid, model registry, static analysis, code smell, XSS hardening), pre-commit changelog enforcement, GitHub Actions CI |
 
 ## 🤖 AI Assistant
 
@@ -456,6 +456,7 @@ TextAgent has undergone significant evolution since its inception. What started
 
 | Date | Commits | Feature / Update |
 |------|---------|-----------------|
+| **2026-03-12** | — | 🧪 **Comprehensive Test Suite** — 12 new Playwright spec files (108 tests) across 5 categories targeting past 3 days of code changes: **Functional** — unit tests for video player (URL detection, HTML builders, embed grid), TTS engine (API surface, state), speech commands (DOM elements, language selector), file converters (MD/CSV/JSON/XML/HTML import), stock widget (rendering, sandbox, double-render prevention); integration tests for embed grid pipeline and AI_MODELS registry. **Regression** — 12 tests pinning recent bug fixes (file upload crash, template confirmation, stock variable, embed rendering, mermaid stability, dark mode, XSS). **Performance** — module init timing (TTS/STT/video/stock/converter < 5–8s), complex render < 5s, embed grid < 3s. **Static Analysis** — ESLint, file size < 100KB, debugger/eval detection, CSS !important audit, IIFE patterns, worker files, HTTPS enforcement. **Security** — embed grid XSS (javascript:/data: URI), video player HTML escaping, YouTube privacy mode, TradingView sandbox, Vimeo DNT, link security, CSP validation. Total test count: 299 |
 | **2026-03-12** | — | 🎤 **Voxtral STT** — [Voxtral Mini 3B](https://huggingface.co/textagent/Voxtral-Mini-3B-2507-ONNX) as primary speech-to-text engine on WebGPU (~2.7 GB, q4, 13 languages, streaming partial output via `TextStreamer`); Whisper Large V3 Turbo as WASM fallback (~800 MB, q8); `voxtral-worker.js` new WebWorker with `VoxtralForConditionalGeneration` + `VoxtralProcessor`; `speechToText.js` WebGPU detection + dual-worker routing; download consent popup (`showSttConsentPopup`) with model name/size/privacy info before first download; `STT_CONSENTED` localStorage key; model duplicated to `textagent/` HuggingFace org with `onnx-community/` fallback |
 | **2026-03-12** | — | 🛡️ **Code Audit Fixes** — sandboxed `jsAdapter` in `exec-sandbox.js` (was raw `eval()` on main thread, now iframe-sandboxed); `mirror-models.sh` model IDs updated to `textagent`, Kokoro v1.0→v1.1-zh, GitLab refs removed; Whisper speech worker forwarded user's language selection instead of hardcoded English; shared `ai-worker-common.js` module extracts `TOKEN_LIMITS` + `buildMessages()` from 3 workers; cloud workers load as ES modules |
 | **2026-03-12** | — | 🏠 **Model Hosting Migration** — all 7 ONNX models (Qwen 3.5 0.8B/2B/4B, Qwen 3 4B Thinking, Whisper Large V3 Turbo, Kokoro 82M v1.0/v1.1-zh) duplicated to self-owned [`textagent` HuggingFace org](https://huggingface.co/textagent); model IDs updated from `onnx-community/` to `textagent/` across all workers; automatic fallback to `onnx-community/` namespace if textagent models unavailable; GitLab mirror removed from runtime code |
 
@@ -0,0 +1,243 @@
+/**
+ * AI Worker — Granite Docling 258M (IBM) — Document OCR
+ *
+ * Converts document images to structured Markdown/HTML using
+ * IBM's Granite Docling vision-language model via Transformers.js.
+ *
+ * Uses AutoModelForVision2Seq + AutoProcessor from Transformers.js.
+ * Supports WebGPU acceleration.
+ *
+ * Message interface:
+ *   setModelId  → configure model ID before loading
+ *   load        → download and initialise model
+ *   process     → run document OCR on an image
+ *   ping/pong   → health check
+ */
+
+const TRANSFORMERS_URL = "https://cdn.jsdelivr.net/npm/@huggingface/transformers@4.0.0-next.6";
+
+// Model config
+let MODEL_ID = "onnx-community/granite-docling-258M-ONNX";
+let MODEL_LABEL = "Granite Docling 258M";
+
+// Dynamically loaded modules
+let AutoProcessor = null;
+let AutoModelForVision2Seq = null;
+let load_image = null;
+let TextStreamer = null;
+
+// Runtime state
+let processor = null;
+let model = null;
+let device = "wasm"; // will upgrade to webgpu if available
+
+/**
+ * Initialize the model: load processor + model
+ */
+async function loadModel() {
+    try {
+        // 1. Import Transformers.js
+        if (!AutoProcessor) {
+            self.postMessage({ type: "status", message: "Loading AI libraries..." });
+            try {
+                const transformers = await import(TRANSFORMERS_URL);
+                AutoProcessor = transformers.AutoProcessor;
+                AutoModelForVision2Seq = transformers.AutoModelForVision2Seq;
+                load_image = transformers.load_image;
+                TextStreamer = transformers.TextStreamer;
+            } catch (importError) {
+                self.postMessage({
+                    type: "error",
+                    message: `Failed to load AI libraries: ${importError.message}`,
+                });
+                return;
+            }
+        }
+
+        // 2. Check WebGPU
+        if (typeof navigator !== "undefined" && navigator.gpu) {
+            const adapter = await navigator.gpu.requestAdapter();
+            if (adapter) device = "webgpu";
+        }
+
+        // 3. Load processor
+        self.postMessage({ type: "status", message: `Loading ${MODEL_LABEL} processor...` });
+        processor = await AutoProcessor.from_pretrained(MODEL_ID, {
+            progress_callback: (progress) => {
+                if (progress.status === "progress") {
+                    self.postMessage({
+                        type: "progress",
+                        file: progress.file || "processor",
+                        loaded: progress.loaded || 0,
+                        total: progress.total || 0,
+                        progress: progress.progress || 0,
+                    });
+                } else if (progress.status === "initiate") {
+                    self.postMessage({
+                        type: "status",
+                        message: `Downloading ${progress.file || "model"}...`,
+                    });
+                }
+            },
+        });
+
+        // 4. Load model
+        self.postMessage({ type: "status", message: `Loading ${MODEL_LABEL} model (${device.toUpperCase()})...` });
+        model = await AutoModelForVision2Seq.from_pretrained(MODEL_ID, {
+            dtype: "fp32",
+            device: device,
+            progress_callback: (progress) => {
+                if (progress.status === "progress") {
+                    self.postMessage({
+                        type: "progress",
+                        file: progress.file || "model",
+                        loaded: progress.loaded || 0,
+                        total: progress.total || 0,
+                        progress: progress.progress || 0,
+                    });
+                } else if (progress.status === "initiate") {
+                    self.postMessage({
+                        type: "status",
+                        message: `Downloading ${progress.file || "model"}...`,
+                    });
+                }
+            },
+        });
+
+        self.postMessage({ type: "loaded", device: device });
+    } catch (error) {
+        self.postMessage({
+            type: "error",
+            message: `Failed to load Docling model: ${error.message}`,
+        });
+    }
+}
+
+/**
+ * Process a document image and convert to structured text
+ * @param {object} options
+ * @param {string} options.imageData - Base64 data URL or URL to the image
+ * @param {string} options.outputFormat - 'docling', 'markdown', 'html', or 'text'
+ * @param {boolean} options.doImageSplitting - Split image into patches for more accuracy
+ * @param {string} options.messageId
+ */
+async function processDocument({ imageData, outputFormat = 'docling', doImageSplitting = false, messageId }) {
+    if (!model || !processor) {
+        self.postMessage({
+            type: "error",
+            message: "Model not loaded. Please wait for the model to finish loading.",
+            messageId,
+        });
+        return;
+    }
+
+    try {
+        self.postMessage({ type: "status", message: "Processing document...", messageId });
+
+        // Load image
+        const image = await load_image(imageData);
+
+        // Build prompt based on output format
+        let promptText = "Convert this page to docling.";
+        if (outputFormat === 'markdown') {
+            promptText = "Convert this page to markdown.";
+        } else if (outputFormat === 'html') {
+            promptText = "Convert this page to html.";
+        } else if (outputFormat === 'text') {
+            promptText = "Extract all text from this page.";
+        }
+
+        // Create messages
+        const messages = [
+            {
+                role: "user",
+                content: [
+                    { type: "image" },
+                    { type: "text", text: promptText },
+                ],
+            },
+        ];
+
+        // Apply chat template and process inputs
+        const text = processor.apply_chat_template(messages, { add_generation_prompt: true });
+        const inputs = await processor(text, [image], {
+            do_image_splitting: doImageSplitting,
+        });
+
+        // Generate with streaming
+        const generated_ids = await model.generate({
+            ...inputs,
+            max_new_tokens: 4096,
+            streamer: new TextStreamer(processor.tokenizer, {
+                skip_prompt: true,
+                skip_special_tokens: false,
+                callback_function: (token) => {
+                    self.postMessage({ type: "token", token: token, messageId });
+                },
+            }),
+        });
+
+        // Decode final output
+        const generated_texts = processor.batch_decode(
+            generated_ids.slice(null, [inputs.input_ids.dims.at(-1), null]),
+            { skip_special_tokens: true },
+        );
+
+        const result = generated_texts[0] || "";
+
+        self.postMessage({
+            type: "complete",
+            text: result,
+            messageId,
+        });
+    } catch (error) {
+        self.postMessage({
+            type: "error",
+            message: `Document processing failed: ${error.message}`,
+            messageId,
+        });
+    }
+}
+
+// Listen for messages from the main thread
+self.addEventListener("message", async (event) => {
+    const { type, messageId } = event.data;
+
+    switch (type) {
+        case "setModelId":
+            MODEL_ID = event.data.modelId || MODEL_ID;
+            MODEL_LABEL = event.data.modelLabel || MODEL_LABEL;
+            break;
+        case "load":
+            await loadModel();
+            break;
+        case "process":
+            await processDocument(event.data);
+            break;
+        // Also support 'generate' for compatibility with the standard worker interface
+        case "generate": {
+            const attachments = event.data.attachments || [];
+            const imageAtt = attachments.find(a => a.type === 'image');
+            if (imageAtt) {
+                await processDocument({
+                    imageData: imageAtt.data,
+                    outputFormat: 'markdown',
+                    doImageSplitting: false,
+                    messageId,
+                });
+            } else {
+                self.postMessage({
+                    type: "error",
+                    message: "Granite Docling requires a document image. Please attach an image.",
+                    messageId,
+                });
+            }
+            break;
+        }
+        case "ping":
+            self.postMessage({ type: "pong" });
+            break;
+        default:
+            console.warn("Unknown message type:", type);
+    }
+});