Skip to content

Fix addressing #38 and #39#40

Closed
omgol411 wants to merge 0 commit intoisblab:mainfrom
omgol411:main
Closed

Fix addressing #38 and #39#40
omgol411 wants to merge 0 commit intoisblab:mainfrom
omgol411:main

Conversation

@omgol411
Copy link
Copy Markdown
Contributor

@omgol411 omgol411 commented Apr 12, 2026

Addressing #38 , output_dir argument from run_prism was removed

For addressing #39 , the following changes were made

  • imap with chunksize > 1 to speed up both density and bead spread calculation

Summary by CodeRabbit

  • Refactor

    • Simplified parallel processing to a single streamlined workflow with improved chunking and CPU utilization
    • Introduced a helper to consolidate per-item computation and conversion steps
    • Standardized output directory handling to always create and use a single output path
  • Style

    • Cleaned up file-read indentation for consistent formatting

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

📝 Walkthrough

Walkthrough

A new helper get_bead_spread(...) was added and the parallel workflow in src/main.py was simplified to a single Pool.imap that computes bead spread per index with an explicit chunksize derived from cores_ = min(os.cpu_count() - 1, args.cores). Output directory creation now always calls os.makedirs(output_dir, exist_ok=True) after defaulting output_dir = args.output. The top-level ihm_parser import was removed in favor of inline parse_ihm_models import. Minor reindentation in src/ihm_parser.py only.

Changes

Cohort / File(s) Summary
Bead spread & parallel flow
src/main.py
Added get_bead_spread(i, coords, mass, radius, grid, voxel_size, n_breaks). Replaced two-stage parallel map (density then spread) with a single Pool(...).imap(get_bead_spread, ...) using explicit chunksize computed from cores_ = min(os.cpu_count() - 1, args.cores) and divmod(..., cores_*4).
Output directory handling
src/main.py
Defaulted output_dir = args.output and unconditionally call os.makedirs(output_dir, exist_ok=True); all subsequent writes use output_dir. Removed prior conditional/existing-directory reassignment logic.
Import adjustments
src/main.py
Removed unconditional top-level ihm_parser import; parse_ihm_models is imported inline where needed.
IHM parser formatting
src/ihm_parser.py
Reindented the with open(..., encoding='utf8') / ihm.reader.read(...) block to align with surrounding try/except; no functional changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰
I hopped through code with careful care,
Merged spread and density into one fair share,
Cores trimmed tidy, chunks sliced right,
Folders made ready in the night,
A rabbit’s joy — computations light!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Fix addressing #38 and #39' references the issue numbers but is vague and does not clearly summarize the main technical changes (refactored multiprocessing with imap/chunksize, simplified density/bead spread calculation, output directory handling). Revise the title to be more descriptive of the actual changes, such as 'Refactor multiprocessing with imap chunksize and combine density/bead spread calculations' or similar.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/main.py`:
- Line 85: The run_prism signature was changed and no longer accepts a per-call
output path, breaking IHM callers that pass f"output_{ist}_{img}" and expect
artifacts in that folder; restore per-call output handling by adding an output
parameter to run_prism (e.g., def run_prism(coords, mass, radius, ps_names,
args, output=None)) and, at the start of run_prism, set args.output = output if
output is provided (falling back to the existing args.output when None) so
existing behavior remains backward compatible; ensure callers in ihm_parser.py
that currently pass a per-model folder continue to pass it and any other
invocations without an output argument still work.
- Around line 98-101: Replace the broken NumPy usage and len() calls by
computing an integer worker count safely: get system_cpus = os.cpu_count() or 1,
then compute worker_count = max(1, min(system_cpus - 1, args.cores)) (ensure
system_cpus-1 doesn't go below 1), use that integer in the chunksize calculation
(chunksize, extra = divmod(coords.shape[1], worker_count * 4)) and adjust the
extra-handling branch; refer to the existing variables cores_, chunksize,
coords.shape, and args.cores when making this change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f675bc40-8d53-4e7c-8780-d491f77d818d

📥 Commits

Reviewing files that changed from the base of the PR and between 37f1bce and ccddf3b.

📒 Files selected for processing (1)
  • src/main.py

Comment thread src/main.py Outdated
Comment thread src/main.py Outdated
Comment on lines +98 to +101
cores_ = np.min(os.cpu_count() - 1, args.cores)
chunksize, extra = divmod(coords.shape[1], len(cores_) * 4)
if extra:
chunksize += 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

What is the signature of numpy.min, and does np.min(x, y)compare two scalar values or treatyas theaxisparameter? Also, what happens in Python when callinglen() on an integer?

💡 Result:

The signature of numpy.min (np.min) is np.min(a, axis=None, out=None, keepdims=, initial=, where=True). It computes the minimum of the array elements. np.min(x, y) where x and y are scalars treats y as the axis parameter (likely resulting in an error if y is not a valid axis index, e.g., TypeError or IndexError), not as a second value to compare. To compare two scalars, use np.minimum(x, y) instead. Calling len on an integer in Python raises TypeError: object of type 'int' has no len. Integers are not iterable or sequence types that support len.

Citations:


🏁 Script executed:

cd /tmp && find . -name "main.py" -type f 2>/dev/null | head -5

Repository: isblab/prism

Length of output: 38


🏁 Script executed:

git ls-files | grep -E "main\.py|src/" | head -20

Repository: isblab/prism

Length of output: 342


🏁 Script executed:

wc -l src/main.py

Repository: isblab/prism

Length of output: 71


🏁 Script executed:

sed -n '95,105p' src/main.py

Repository: isblab/prism

Length of output: 361


🏁 Script executed:

grep -n "args.cores\|argparse\|add_argument.*cores" src/main.py | head -20

Repository: isblab/prism

Length of output: 272


🏁 Script executed:

head -50 src/main.py

Repository: isblab/prism

Length of output: 1990


Fix the worker-count/chunk-size calculation before opening the pool.

np.min(os.cpu_count() - 1, args.cores) misuses the NumPy API—the second argument is treated as the axis parameter, not a value to compare. Line 99 then calls len() on the resulting scalar, which will raise TypeError: object of type 'int' has no len(). This code will fail before either Pool(...) starts, and it can also produce 0 workers on single-core or unknown-CPU hosts.

Suggested fix
-	cores_ = np.min(os.cpu_count() - 1, args.cores)
-	chunksize, extra = divmod(coords.shape[1], len(cores_) * 4)
-	if extra:
-        chunksize += 1
+	cpu_count = os.cpu_count() or 1
+	usable_cores = cpu_count - 1 if cpu_count > 1 else 1
+	cores_ = max(1, min(usable_cores, args.cores))
+	chunksize, extra = divmod(coords.shape[1], cores_ * 4)
+	if extra:
+		chunksize += 1
+	chunksize = max(1, chunksize)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cores_ = np.min(os.cpu_count() - 1, args.cores)
chunksize, extra = divmod(coords.shape[1], len(cores_) * 4)
if extra:
chunksize += 1
cpu_count = os.cpu_count() or 1
usable_cores = cpu_count - 1 if cpu_count > 1 else 1
cores_ = max(1, min(usable_cores, args.cores))
chunksize, extra = divmod(coords.shape[1], cores_ * 4)
if extra:
chunksize += 1
chunksize = max(1, chunksize)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main.py` around lines 98 - 101, Replace the broken NumPy usage and len()
calls by computing an integer worker count safely: get system_cpus =
os.cpu_count() or 1, then compute worker_count = max(1, min(system_cpus - 1,
args.cores)) (ensure system_cpus-1 doesn't go below 1), use that integer in the
chunksize calculation (chunksize, extra = divmod(coords.shape[1], worker_count *
4)) and adjust the extra-handling branch; refer to the existing variables
cores_, chunksize, coords.shape, and args.cores when making this change.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/main.py (1)

65-85: ⚠️ Potential issue | 🔴 Critical

Harden worker/chunksize calculation and use the computed worker count.

Line 65 can still fail when os.cpu_count() is None, and on single-core hosts it can yield 0, which makes Line 66 divide by zero. Also, Line 80 should use the computed worker count, not args.cores.

Proposed fix
-	cores_ = min(os.cpu_count() - 1, args.cores)
-	chunksize, extra = divmod(coords.shape[1], cores_ * 4)
+	cpu_count = os.cpu_count() or 1
+	usable_cores = max(1, cpu_count - 1)
+	cores_ = max(1, min(usable_cores, args.cores))
+	chunksize, extra = divmod(coords.shape[1], cores_ * 4)
 	if extra:
 		chunksize += 1
+	chunksize = max(1, chunksize)
@@
-	with Pool(args.cores) as p:
+	with Pool(cores_) as p:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main.py` around lines 65 - 85, Protect against os.cpu_count() returning
None and single-core hosts by computing workers = max(1, (os.cpu_count() or 1) -
1, 1) (or equivalent) and use that variable (e.g., cores_ or workers)
everywhere; ensure the divmod denominator uses workers * 4 and never zero,
compute chunksize with a safe fallback (e.g., if coords.shape[1] == 0 then
chunksize = 1) and only increment chunksize when extra > 0, and finally pass the
computed workers to Pool(...) instead of args.cores; update references to
_part_func, chunksize, extra and the Pool(...) call to use the new workers
variable consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/main.py`:
- Line 145: The --return_spread argument is defined with action='store_true' but
default=True so it is always True; update the parser.add_argument call for
"--return_spread" to make it togglable by either setting default=False when
keeping action='store_true' or switching to argparse.BooleanOptionalAction to
provide paired --return-spread / --no-return-spread flags (referencing the
parser.add_argument invocation for "--return_spread" and the action parameter).

---

Duplicate comments:
In `@src/main.py`:
- Around line 65-85: Protect against os.cpu_count() returning None and
single-core hosts by computing workers = max(1, (os.cpu_count() or 1) - 1, 1)
(or equivalent) and use that variable (e.g., cores_ or workers) everywhere;
ensure the divmod denominator uses workers * 4 and never zero, compute chunksize
with a safe fallback (e.g., if coords.shape[1] == 0 then chunksize = 1) and only
increment chunksize when extra > 0, and finally pass the computed workers to
Pool(...) instead of args.cores; update references to _part_func, chunksize,
extra and the Pool(...) call to use the new workers variable consistently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7e10a549-ec36-447b-b4ed-b62334f48355

📥 Commits

Reviewing files that changed from the base of the PR and between ccddf3b and 2f2371f.

📒 Files selected for processing (2)
  • src/ihm_parser.py
  • src/main.py
✅ Files skipped from review due to trivial changes (1)
  • src/ihm_parser.py

Comment thread src/main.py Outdated
help="Type of input: npz/pdb/cif/rmf/dcd.",
required=True, type=str)
parser.add_argument("--voxel_size", "-v", help="Voxel size for density calculations", default=4, type=int)
parser.add_argument("--return_spread", "-rs", help="Return the spread bead_spread", action='store_true', default = True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's check if src/main.py exists and examine the area around line 145
if [ -f src/main.py ]; then
  echo "=== Checking src/main.py around line 145 ==="
  wc -l src/main.py
  sed -n '140,150p' src/main.py
else
  echo "src/main.py not found at expected location"
  # Try to find main.py
  find . -name "main.py" -type f
fi

Repository: isblab/prism

Length of output: 1113


🏁 Script executed:

# Search for any related --return-spread or --no-return-spread arguments
rg "return.spread|return_spread" --max-count 20

Repository: isblab/prism

Length of output: 2391


🏁 Script executed:

# Look for other argparse action='store_true' patterns to understand the codebase's conventions
rg "action=.*store" -A 1 src/main.py

Repository: isblab/prism

Length of output: 269


--return_spread flag is non-toggleable and always evaluates to True.

With action='store_true' and default=True, the argument will be True whether the flag is provided or absent, making it impossible for users to disable this feature. Either change to default=False if toggling is desired, or use argparse.BooleanOptionalAction (Python 3.9+) with paired flags like --return-spread and --no-return-spread.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main.py` at line 145, The --return_spread argument is defined with
action='store_true' but default=True so it is always True; update the
parser.add_argument call for "--return_spread" to make it togglable by either
setting default=False when keeping action='store_true' or switching to
argparse.BooleanOptionalAction to provide paired --return-spread /
--no-return-spread flags (referencing the parser.add_argument invocation for
"--return_spread" and the action parameter).

@omgol411 omgol411 requested a review from shruthivis April 12, 2026 16:13
Copy link
Copy Markdown
Collaborator

@shruthivis shruthivis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff has too many unrelated changes in too many unrelated lines.
Stick to 1 change in 1 PR. Submit a PR addressing only the imap issue with minimal code changes to make it work as expected.
Don't move around lines unless required (e.g. bead_spread=[] initialization).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants