Scraper performance & retry logic improvements (multithreading, cached hashes, snooze) #286

kutaos · 2026-01-25T21:05:03Z

kutaos
Jan 25, 2026

I would like to propose a few enhancements to the built-in Scraper to improve performance and reduce unnecessary work when scraping large ROM libraries.

⸻

Multithreaded media downloads (respecting ScreenScraper account limits)

Currently, the Scraper appears to download media (box art, screenshots, videos, etc.) sequentially. For large libraries this makes scraping very slow.

Proposal:
Allow the Scraper to download media using multiple threads. However, the number of concurrent threads should be automatically limited based on the user’s ScreenScraper account tier (free, registered, or supporter), as defined by ScreenScraper’s own connection limits.

Ideally, ArkOS would:
• Query the ScreenScraper API to detect the user’s allowed number of simultaneous connections
• Set the Scraper’s download thread pool to match that limit
• Never exceed the user’s ScreenScraper quota

This would maximize download speed while remaining fully compliant with ScreenScraper’s usage policies.

⸻

Cache CRC/MD5 for ROMs that have missing media

From what I understand, the Scraper computes a CRC or MD5 for each ROM and compares it against online databases to find matching metadata and media.

For ROMs where no media is found, the same CRC/MD5 is recalculated every time the Scraper runs, which is unnecessary work.

Proposal:
Store the CRC/MD5 of ROMs that failed to find media in a local cache or database.
On future runs, the Scraper could reuse this stored hash instead of recomputing it, making repeated scans much faster.

This would be especially helpful for:
• Large ROM sets
• Systems with slow storage or CPUs
• Repeated scraping sessions

⸻

“Snooze” or retry delay for failed matches

When a ROM fails to find any matching media, the Scraper will keep retrying it every time a scan is run, even if nothing has changed on the server side.

Proposal:
Add a configurable “snooze” period for ROMs with failed matches, for example:
• None (always retry)
• 1 week
• 2 weeks
• 1 month

The Scraper would only retry these ROMs once the selected period has passed.
This would:
• Reduce unnecessary API calls
• Speed up subsequent scraping runs
• Avoid repeatedly querying the same unmatched ROMs

⸻

Taken together, these changes would significantly improve:
• Scraping speed
• CPU and I/O efficiency
• API usage
• Overall user experience for large collections

Thank you for considering these ideas.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scraper performance & retry logic improvements (multithreading, cached hashes, snooze) #286

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Scraper performance & retry logic improvements (multithreading, cached hashes, snooze) #286

Uh oh!

kutaos Jan 25, 2026

Replies: 0 comments

kutaos
Jan 25, 2026