parallelize distance calculation during get_patches#49
Merged
shruthivis merged 4 commits intoisblab:mainfrom Apr 14, 2026
Merged
parallelize distance calculation during get_patches#49shruthivis merged 4 commits intoisblab:mainfrom
get_patches#49shruthivis merged 4 commits intoisblab:mainfrom
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/patch_computer.py`:
- Around line 38-47: calc_distance_matrix currently materializes
list(itertools.combinations(args, 2)) which is O(n^2) memory; instead stream
combinations in chunks and feed those chunks to the worker pool. Replace the
eager list with an iterator = itertools.combinations(args, 2) and create a small
chunking generator (using itertools.islice) that yields fixed-size lists of
pairs; then use Pool.map or pool.imap over that chunk generator to call
worker_calc_distance (keeping initializer=initialize_worker and
initargs=(coords, radius)) and flatten the returned batches into mean_dist. This
preserves existing helpers (calc_distance_matrix, worker_calc_distance,
initialize_worker) but avoids building the full combinations list in memory.
- Line 38: Default argument evaluation uses os.cpu_count() at import time and
can raise TypeError if it returns None; change the three functions (including
calc_distance_matrix) to accept cores=None and compute the actual core count
inside the function using a module-level default. Add a module-level
DEFAULT_CORES like: DEFAULT_CORES = min(max((os.cpu_count() or 1) - 1, 1), 16)
and inside each function set cores = DEFAULT_CORES if cores is None (rather than
calling os.cpu_count() in the signature). This ensures safe handling when
os.cpu_count() returns None and avoids import-time errors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5f046d4a-cadc-40f3-ac68-86c9af8d6440
📒 Files selected for processing (2)
src/main.pysrc/patch_computer.py
- from coderabbit suggestion
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Addressing #48
Pool.mapNote
Vector operations similar to get_pairwise_map is not appropriate here due to potential out of memory issues. We do not know how large
coordscan be.I've crosschecked the results with the previous implementation, they match.
This implementation gives ~4-5 fold improvement in speed.
Summary by CodeRabbit