You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Mar 24, 2026. It is now read-only.
I would like to propose a few enhancements to the built-in Scraper to improve performance and reduce unnecessary work when scraping large ROM libraries.
⸻
Multithreaded media downloads (respecting ScreenScraper account limits)
Currently, the Scraper appears to download media (box art, screenshots, videos, etc.) sequentially. For large libraries this makes scraping very slow.
Proposal:
Allow the Scraper to download media using multiple threads. However, the number of concurrent threads should be automatically limited based on the user’s ScreenScraper account tier (free, registered, or supporter), as defined by ScreenScraper’s own connection limits.
Ideally, ArkOS would:
• Query the ScreenScraper API to detect the user’s allowed number of simultaneous connections
• Set the Scraper’s download thread pool to match that limit
• Never exceed the user’s ScreenScraper quota
This would maximize download speed while remaining fully compliant with ScreenScraper’s usage policies.
⸻
Cache CRC/MD5 for ROMs that have missing media
From what I understand, the Scraper computes a CRC or MD5 for each ROM and compares it against online databases to find matching metadata and media.
For ROMs where no media is found, the same CRC/MD5 is recalculated every time the Scraper runs, which is unnecessary work.
Proposal:
Store the CRC/MD5 of ROMs that failed to find media in a local cache or database.
On future runs, the Scraper could reuse this stored hash instead of recomputing it, making repeated scans much faster.
This would be especially helpful for:
• Large ROM sets
• Systems with slow storage or CPUs
• Repeated scraping sessions
⸻
“Snooze” or retry delay for failed matches
When a ROM fails to find any matching media, the Scraper will keep retrying it every time a scan is run, even if nothing has changed on the server side.
Proposal:
Add a configurable “snooze” period for ROMs with failed matches, for example:
• None (always retry)
• 1 week
• 2 weeks
• 1 month
The Scraper would only retry these ROMs once the selected period has passed.
This would:
• Reduce unnecessary API calls
• Speed up subsequent scraping runs
• Avoid repeatedly querying the same unmatched ROMs
⸻
Taken together, these changes would significantly improve:
• Scraping speed
• CPU and I/O efficiency
• API usage
• Overall user experience for large collections
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I would like to propose a few enhancements to the built-in Scraper to improve performance and reduce unnecessary work when scraping large ROM libraries.
⸻
Currently, the Scraper appears to download media (box art, screenshots, videos, etc.) sequentially. For large libraries this makes scraping very slow.
Proposal:
Allow the Scraper to download media using multiple threads. However, the number of concurrent threads should be automatically limited based on the user’s ScreenScraper account tier (free, registered, or supporter), as defined by ScreenScraper’s own connection limits.
Ideally, ArkOS would:
• Query the ScreenScraper API to detect the user’s allowed number of simultaneous connections
• Set the Scraper’s download thread pool to match that limit
• Never exceed the user’s ScreenScraper quota
This would maximize download speed while remaining fully compliant with ScreenScraper’s usage policies.
⸻
From what I understand, the Scraper computes a CRC or MD5 for each ROM and compares it against online databases to find matching metadata and media.
For ROMs where no media is found, the same CRC/MD5 is recalculated every time the Scraper runs, which is unnecessary work.
Proposal:
Store the CRC/MD5 of ROMs that failed to find media in a local cache or database.
On future runs, the Scraper could reuse this stored hash instead of recomputing it, making repeated scans much faster.
This would be especially helpful for:
• Large ROM sets
• Systems with slow storage or CPUs
• Repeated scraping sessions
⸻
When a ROM fails to find any matching media, the Scraper will keep retrying it every time a scan is run, even if nothing has changed on the server side.
Proposal:
Add a configurable “snooze” period for ROMs with failed matches, for example:
• None (always retry)
• 1 week
• 2 weeks
• 1 month
The Scraper would only retry these ROMs once the selected period has passed.
This would:
• Reduce unnecessary API calls
• Speed up subsequent scraping runs
• Avoid repeatedly querying the same unmatched ROMs
⸻
Taken together, these changes would significantly improve:
• Scraping speed
• CPU and I/O efficiency
• API usage
• Overall user experience for large collections
Thank you for considering these ideas.
Beta Was this translation helpful? Give feedback.
All reactions