Skip to content

memory limit to a parsl worker in HPCExecutor#464

Merged
lewisjared merged 5 commits intomainfrom
hpc_memory_constraints
Feb 25, 2026
Merged

memory limit to a parsl worker in HPCExecutor#464
lewisjared merged 5 commits intomainfrom
hpc_memory_constraints

Conversation

@minxu74
Copy link
Contributor

@minxu74 minxu74 commented Oct 14, 2025

Description

Checklist

Please confirm that this pull request has done the following:

  • Tests added
  • Documentation added (where applicable)
  • Changelog item added to changelog/

@codecov
Copy link

codecov bot commented Oct 14, 2025

Codecov Report

❌ Patch coverage is 84.61538% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...ckages/climate-ref/src/climate_ref/executor/hpc.py 84.61% 2 Missing ⚠️
Flag Coverage Δ
core 93.20% <84.61%> (-0.03%) ⬇️
providers 89.65% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ckages/climate-ref/src/climate_ref/executor/hpc.py 63.46% <84.61%> (+1.02%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@minxu74 minxu74 requested a review from lewisjared February 24, 2026 02:45
@minxu74
Copy link
Contributor Author

minxu74 commented Feb 24, 2026

@lewisjared It is ready for your review. I changed the hard-coded memory limit to a dynamic one using a environment variable MEMORY_LIMIT_PARSL_JOB_GB. The memory constraint was implemented by setting the soft threshold of RLIMIT_AS resource. It is not very robust, but the test run on NERSC has showed its feasibility to avoid OOM errors.

@minxu74 minxu74 changed the title [WIP] memory limit to a parsl worker in HPCExecutor memory limit to a parsl worker in HPCExecutor Feb 24, 2026
Copy link
Contributor

@lewisjared lewisjared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. This with_memory_limit function might be useful for the celery workers as well. Currently the resource limits are applied at the kubernetes layer

Does it OOM when it hits the soft limit?

@lewisjared lewisjared merged commit ea0bee8 into main Feb 25, 2026
26 of 27 checks passed
@lewisjared lewisjared deleted the hpc_memory_constraints branch February 25, 2026 22:24
@minxu74
Copy link
Contributor Author

minxu74 commented Feb 26, 2026

@lewisjared No. When a parsl worker hits the soft limit, a MemoryError will be thrown out for this worker, but the other parsl workers won't be affected and are still running. The memory constraint will keep the total memory used by all parsl workers in a node under its total physical memory size, thus it could prevent the OOM error in a node. The OOM in any nodes of a HPC job will kill all workers across multiple nodes and make the entire job quit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants