Skip to content

Stop Chrome downloading files a crawl links to#33

Merged
tamnd merged 1 commit into
mainfrom
fix-download-deny
Jun 16, 2026
Merged

Stop Chrome downloading files a crawl links to#33
tamnd merged 1 commit into
mainfrom
fix-download-deny

Conversation

@tamnd

@tamnd tamnd commented Jun 16, 2026

Copy link
Copy Markdown
Owner

An extensionless link is queued as a page, so the page worker navigated to it in headless Chrome. When the link served a binary, a zip or a CSV, Chrome saved the file to the user's Downloads folder, a surprise side effect of running a clone. This is issue #32.

Two layers of fix:

  • Deny Chrome-initiated downloads browser-wide. kage fetches every asset through its own downloader and never needs the browser to write a file, so a Chrome download is only ever an accident.
  • Watch the main document's response and, when it is not HTML, return a typed ErrNotHTML. The page worker catches it and reroutes the URL to the asset downloader, where the existing size and media policy decides whether to localise it or leave it on the live web. So defense in depth covers the case even if the deny call is unsupported on some Chrome build.

Verified manually against the two URLs from the issue. The zip and the CSV both land under the mirror's reserved tree as assets, and nothing is written to ~/Downloads.

Tests added: a unit table for the HTML content-type check, a browser integration test that asserts a zip and a CSV come back as ErrNotHTML while an HTML page still renders, and a clone integration test that a linked non-HTML target is fetched as an asset rather than saved as a page.

Refs #32

An extensionless link is queued as a page, so the page worker navigated to
it in headless Chrome. When such a link served a binary, a zip or a CSV,
Chrome saved the file to the user's Downloads folder, a surprise side effect
of a clone (issue #32).

Deny Chrome-initiated downloads browser-wide, since kage fetches every asset
through its own downloader and never needs the browser to write a file. Then
watch the main document's response, and when it is not HTML, return a typed
ErrNotHTML so the page worker reroutes the URL to the asset downloader, where
the existing size and media policy decides whether to localise it or leave it
on the live web.

Verified against the two URLs from the issue, a zip and a CSV: both land
under the mirror's reserved tree and nothing is written to Downloads.
@tamnd tamnd merged commit 5cbb7f8 into main Jun 16, 2026
9 checks passed
@tamnd tamnd deleted the fix-download-deny branch June 16, 2026 03:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant