Skip to content

parallelize distance calculation during get_patches#49

Merged
shruthivis merged 4 commits intoisblab:mainfrom
omgol411:main
Apr 14, 2026
Merged

parallelize distance calculation during get_patches#49
shruthivis merged 4 commits intoisblab:mainfrom
omgol411:main

Conversation

@omgol411
Copy link
Copy Markdown
Contributor

@omgol411 omgol411 commented Apr 13, 2026

Addressing #48

  • Surface-surface distance calculation is parallelized using Pool.map

Note

Vector operations similar to get_pairwise_map is not appropriate here due to potential out of memory issues. We do not know how large coords can be.

I've crosschecked the results with the previous implementation, they match.

This implementation gives ~4-5 fold improvement in speed.

Summary by CodeRabbit

  • Performance
    • Optimized computation efficiency by implementing multi-core processing support, allowing the system to utilize available CPU cores for significantly faster task execution.

@omgol411 omgol411 self-assigned this Apr 13, 2026
@omgol411 omgol411 requested a review from shruthivis April 13, 2026 16:49
@coderabbitai

This comment was marked as off-topic.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/patch_computer.py`:
- Around line 38-47: calc_distance_matrix currently materializes
list(itertools.combinations(args, 2)) which is O(n^2) memory; instead stream
combinations in chunks and feed those chunks to the worker pool. Replace the
eager list with an iterator = itertools.combinations(args, 2) and create a small
chunking generator (using itertools.islice) that yields fixed-size lists of
pairs; then use Pool.map or pool.imap over that chunk generator to call
worker_calc_distance (keeping initializer=initialize_worker and
initargs=(coords, radius)) and flatten the returned batches into mean_dist. This
preserves existing helpers (calc_distance_matrix, worker_calc_distance,
initialize_worker) but avoids building the full combinations list in memory.
- Line 38: Default argument evaluation uses os.cpu_count() at import time and
can raise TypeError if it returns None; change the three functions (including
calc_distance_matrix) to accept cores=None and compute the actual core count
inside the function using a module-level default. Add a module-level
DEFAULT_CORES like: DEFAULT_CORES = min(max((os.cpu_count() or 1) - 1, 1), 16)
and inside each function set cores = DEFAULT_CORES if cores is None (rather than
calling os.cpu_count() in the signature). This ensures safe handling when
os.cpu_count() returns None and avoids import-time errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5f046d4a-cadc-40f3-ac68-86c9af8d6440

📥 Commits

Reviewing files that changed from the base of the PR and between 9150df4 and 77953e4.

📒 Files selected for processing (2)
  • src/main.py
  • src/patch_computer.py

Comment thread src/patch_computer.py Outdated
Comment thread src/patch_computer.py Outdated
@shruthivis shruthivis merged commit c49b05b into isblab:main Apr 14, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants