-
Notifications
You must be signed in to change notification settings - Fork 0
Getting More Memory In Recovery Jobs
ltrestka edited this page Mar 14, 2024
·
2 revisions
Two approaches here:
- For jobs with fixed/no inputs (i.e. event generation) use autorelease to restart jobs that get held for memory.
- For jobs which read SAM datasets, use recovery launches
Note that to make this work smoothly, you need your jobs that go over memory to not hang around in "Held" status
forever. You can avoid this by setting:
[stage_whatever]
...
submit.line_1 = +PeriodicRemove=JobStatus==5&&HoldReasonCode==26&&CurrentTime-EnteredCurrentStatus>3600
in your fife_launch config, or by adding
--line '+PeriodicRemove=JobStatus==5&&HoldReasonCode==26&&CurrentTime-EnteredCurrentStatus>3600'
to your jobsub_submit parameters otherwise.
You can, in your JobTypes, add recovery launches, and in particular you can add ones that override launch options to request more memory. If you are using fife_launch, this can be accomplished by
- Opening the campaign in the GUI Campaign editor

-
double clicking on the job type

-
change the name (maybe add _with_mem_recovery)
-
click the Edit button next to Recoveries

- pick proj_status for the recovery type
- click the edit button on the right to edit the Param Overrides
- in the param editor, set the override for submit.memory for fife_launch

- Accept/OK in each popup

- check stages that use that jobtype to get the new one, and press Done

- press Save for the whole campaign.