feat: add multiprocessing to sampling module#26
Merged
pFornagiel merged 23 commits intomainfrom Mar 17, 2026
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds joblib-based multiprocessing to Basin-Hopping sampling to speed up LON construction while maintaining reproducibility across different n_jobs settings, plus accompanying API/docs/test updates.
Changes:
- Introduces parallel (
joblib) execution for independent Basin-Hopping runs with per-runSeedSequenceRNGs. - Adds
n_jobstoBasinHoppingSamplerConfigandverboseprogress output viatqdm. - Updates docs/examples and adds tests validating sequential/parallel reproducibility.
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
src/lonkit/sampling.py |
Refactors sampling into single/sequential/parallel paths; adds n_jobs + verbose; module-level worker for joblib. |
src/lonkit/step_size.py |
Updates perturbation calls to pass an RNG explicitly. |
tests/test_parallel_sampling.py |
Adds reproducibility tests comparing sequential vs parallel results. |
pyproject.toml |
Adds joblib and tqdm as required dependencies. |
uv.lock |
Adds joblib to lockfile and project dependency metadata. |
docs/user-guide/sampling.md |
Updates defaults and examples to match the new sampling API. |
docs/user-guide/examples.md |
Updates examples to use the new sample() return object. |
docs/api/step_size.md |
Adds API docs page for the step size module. |
docs/api/index.md |
Links the step size module into the API index. |
README.md |
Updates quickstart to use sample() + sample_to_lon(). |
Comments suppressed due to low confidence (1)
src/lonkit/sampling.py:72
n_jobsis a new public config parameter but__post_init__doesn't validate it. At minimum, rejectn_jobs == 0(joblib treats this as invalid) and consider validating that non-Nonevalues are ints to fail fast with a clear error.
def __post_init__(self) -> None:
if self.n_iter_no_change is not None and self.n_iter_no_change <= 0:
raise ValueError("n_iter_no_change must be positive or None.")
if self.max_iter is not None and self.max_iter <= 0:
raise ValueError("max_iter must be positive or None.")
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
WojtAcht
reviewed
Mar 15, 2026
Member
WojtAcht
left a comment
There was a problem hiding this comment.
Please update the version and the changelog.
Collaborator
Author
|
@WojtAcht, I also bumped version to 0.2.0 and added a short changelog entry - take a look if everything is alright before I merge if you want |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR introduces parallel Basin-Hopping sampling using
joblib, allowing for speedups during LON construction by running multiple independent sampling runs concurrently across multiple CPU cores. The PR also includes some API change and default parameter tuning.I tried to keep the structure of the
sampling.pyas close to previous implementation as possible, but running the sampling procedure in separate threads required some greater changes. Most notibly, creating the_run_single_bh_in_workerfunction, which is needed because of how joblib handles pararellism - the function run by the multiprocessing runner has to be memory-independent from other objects in the file, so creating another function instead of method and creating a newBasinHoppingSamplerobject was required.The pararell sampling procedure was benchmarked against the single-threaded and other implementation, which was discussed internally. It proves to be efficient enough.
Changes Made
src/lonkit/sampling.pyBasinHoppingSamplerConfign_jobs: int | None = 1parameter to control parallel execution.BasinHoppingSampler_single_bh_run(): Executes a single Basin-Hopping run_sequential_bh(): Runs sequentially (wheneffective_jobs=1)_parallel_bh(): Runs in parallel usingjoblib(wheneffective_jobs != 1)verbose: bool = Falseparameter tosample()for progress display usingtqdm_perturbation()now acceptsrngas parameter instead of using instance-level RNG_rngwas dropped, because of the need to have reproducible approach across the processes, which requiredSeedSequenceModule-level worker
_run_single_bh_in_worker()module-level function forjoblibpickling compatibilitysrc/lonkit/step_size.py_perturbation()calls to passrngparameter for consistency with sampling changestests/test_parallel_sampling.py(new file)Documentation
Dependencies
joblib>=1.3.0andtqdmas a required dependencies