-
Notifications
You must be signed in to change notification settings - Fork 1
β‘ Bolt: Parallelize URL validation in warm_up_cache #129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -469,17 +469,25 @@ def fetch_folder_data(url: str) -> Dict[str, Any]: | |
|
|
||
| def warm_up_cache(urls: Sequence[str]) -> None: | ||
| urls = list(set(urls)) | ||
| urls_to_fetch = [u for u in urls if u not in _cache and validate_folder_url(u)] | ||
| # Optimization: Filter out already cached URLs (content check) | ||
| urls_to_fetch = [u for u in urls if u not in _cache] | ||
| if not urls_to_fetch: | ||
| return | ||
|
|
||
| total = len(urls_to_fetch) | ||
| if not USE_COLORS: | ||
| log.info(f"Warming up cache for {total} URLs...") | ||
|
|
||
| # Helper function to validate AND fetch in the worker thread | ||
| # Validation involves DNS lookups (blocking I/O), so parallelization is critical. | ||
| def _validate_and_fetch(url: str) -> None: | ||
| if validate_folder_url(url): | ||
| _gh_get(url) | ||
|
Comment on lines
+483
to
+485
|
||
|
|
||
| completed = 0 | ||
| with concurrent.futures.ThreadPoolExecutor() as executor: | ||
| futures = {executor.submit(_gh_get, url): url for url in urls_to_fetch} | ||
| # Submit task that does both validation and fetch | ||
| futures = {executor.submit(_validate_and_fetch, url): url for url in urls_to_fetch} | ||
|
|
||
| if USE_COLORS: | ||
| sys.stderr.write(f"\r{Colors.CYAN}β³ Warming up cache: 0/{total}...{Colors.ENDC}") | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The progress counter now includes URLs that fail validation, which may be misleading. In the previous implementation,
totalonly counted URLs that passed validation. Now it counts all non-cached URLs, including those that will fail validation and never be fetched. Consider updating the progress messages to clarify this, or decrement the total for URLs that fail validation.