Stop Chrome downloading files a crawl links to#33
Merged
Conversation
An extensionless link is queued as a page, so the page worker navigated to it in headless Chrome. When such a link served a binary, a zip or a CSV, Chrome saved the file to the user's Downloads folder, a surprise side effect of a clone (issue #32). Deny Chrome-initiated downloads browser-wide, since kage fetches every asset through its own downloader and never needs the browser to write a file. Then watch the main document's response, and when it is not HTML, return a typed ErrNotHTML so the page worker reroutes the URL to the asset downloader, where the existing size and media policy decides whether to localise it or leave it on the live web. Verified against the two URLs from the issue, a zip and a CSV: both land under the mirror's reserved tree and nothing is written to Downloads.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
An extensionless link is queued as a page, so the page worker navigated to it in headless Chrome. When the link served a binary, a zip or a CSV, Chrome saved the file to the user's Downloads folder, a surprise side effect of running a clone. This is issue #32.
Two layers of fix:
ErrNotHTML. The page worker catches it and reroutes the URL to the asset downloader, where the existing size and media policy decides whether to localise it or leave it on the live web. So defense in depth covers the case even if the deny call is unsupported on some Chrome build.Verified manually against the two URLs from the issue. The zip and the CSV both land under the mirror's reserved tree as assets, and nothing is written to
~/Downloads.Tests added: a unit table for the HTML content-type check, a browser integration test that asserts a zip and a CSV come back as
ErrNotHTMLwhile an HTML page still renders, and a clone integration test that a linked non-HTML target is fetched as an asset rather than saved as a page.Refs #32