sbatch submission failures do not continue#1
Open
trickytank wants to merge 1 commit intonathanhaigh:masterfrom
Open
sbatch submission failures do not continue#1trickytank wants to merge 1 commit intonathanhaigh:masterfrom
trickytank wants to merge 1 commit intonathanhaigh:masterfrom
Conversation
Owner
|
Thanks for the contribution! I've had similar sporadic fails with the function retry {
local n=1
local max=5
local delay=5
while true; do
"$@" && break || {
if [[ $n -lt $max ]]; then
>&2 echo "WARN: Command ($@) failed on attempt $n/$max:"
sleep $delay
else
>&2 echo "ERROR: Command ($@) failed after $n attempts."
exit 1
fi
((n++))
}
done
}
set -o pipefail
JOBID=$(retry sbatch ${DEP_STRING} ${SBATCH_ARGS} $@ | cut -f4 -d' ')
echo -n "${JOBID}"This has the advantage of also not becoming stuck in an infinite loop as it breaks out after 5 failed attempts. What do you think? |
Author
|
It's much nicer to have a generic retry function. For my purposes I'd set the max to a large value, as there have been ~15 minute periods that submission has failed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is to prevent errors from sbatch causing trouble.
I sometimes have the following error from sbatch:
This causes the JOBID to be an empty string, which later causes an error in sacct. This does not resolve as the status script assumes the job is still running.
This fixes the problem by waiting until the job is properly submitted. There is a 10 second wait between submissions as submission failures appear to cluster at the same time.