Skip to content

Improved Slurm backend for job submission#5177

Open
besserox wants to merge 4 commits into
easybuilders:developfrom
besserox:new_slurm_backend
Open

Improved Slurm backend for job submission#5177
besserox wants to merge 4 commits into
easybuilders:developfrom
besserox:new_slurm_backend

Conversation

@besserox
Copy link
Copy Markdown
Contributor

@besserox besserox commented Apr 20, 2026

This is an improved version of the Slurm backend for job submission (option --job).

Same functionalities

  • Submit each build as an individual Slurm job
  • Handle dependencies between Slurm jobs

New functionalities

  • Respect the --job-max-jobs setting, and manage a submission queue accordingly
  • Track the state of the Slurm jobs and print the current status according to the option --job-polling-interval
  • Print a summary of all build jobs status at the end

Change of behavior

  • Terminal output if different: Any tool relies on the output generated by the --job Slurm option will be broken.
  • Synchronous behavior: With this updated backend, EasyBuild waits for all the build jobs to be finished to exit and report the status of the whole build. This is similar to the natural behavior of EasyBuild (without --job) or to the behavior of other backend like GC3Pie. This differs from the current behavior of the Slurm backend where EasyBuild exits once all the jobs have been submitted.

If this change of behavior is too important to be merged as it is, I can suggest two options:

  • Option A: If using --job --job-backend Slurm without --job-max-jobs, then use the old behavior, ie asynchronous. If using --job --job-backend Slurm with --job-max-jobs, then use the new behavior, ie synchronous. However, I think it could be a bit confusing.
  • Option B: Instead of updating the Slurm backend, make this a completely new backend with a new name, for example SlurmQueue.

Use of AI tools

  • I wrote the code myself, based on the existing Slurm backend, and a prototype version developed by @ekieffer.
  • I used Ollama with model Qwen3.5-35b to review and improve my code.

- Submit each build as an individual Slurm job (same as before)
- Handle dependencies between Slurm jobs (same as before)
- Respect the --job-max-jobs setting, and manage a queue accordingly (new)
- Track the state of the Slurm jobs and print a summary at the end (new)
- Synchronous behavior, ie block until the end of the execution (new)
super().__init__(*args, **kwargs)

# Add maximum jobs submitted to a queue
self.job_polling_interval = build_option('job_polling_interval')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no default value for polling interval, what is it? It should be impossible to run with e.g. 0, else the code will spam SLURM. Some number of minutes would seem like a sensible minimum?

The comment above polling interval refers to maximum jobs, which is actually set below.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking:

  • The default value for job_polling_interval is 30s and it is set in tools/options.py line 951.
  • I added an additional check to make sure we don't get anything less than 1s.
  • I fixed the mixed-up comments.

I have been running it many times with the default polling interval of 30s without any issue.
I believe it is fine to let the user use a low value if he wishes, even though I would not recommend it.

Copy link
Copy Markdown
Contributor

@hattom hattom Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now the default value was already there -- I thought this was a new parameter, so was expecting to see it defined in this PR.
I'll resolve this thread (if I can).

@besserox besserox force-pushed the new_slurm_backend branch from 50fde24 to 35f2340 Compare April 27, 2026 09:18
@bartoldeman bartoldeman added this to the next release (5.3.1?) milestone May 6, 2026
@bartoldeman bartoldeman added enhancement AI-assisted AI-assisted contributions labels May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI-assisted AI-assisted contributions enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants