-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy pathgpu-job.submit
More file actions
41 lines (32 loc) · 1.5 KB
/
gpu-job.submit
File metadata and controls
41 lines (32 loc) · 1.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# Job requirements - ensure we are running on a newer GPU
# node and have enough resources to execute our code
# Tensorflow also requires AVX instruction set
Requirements = GPUs_Capability >= 8.0 && GPUs_DriverVersion > 11.0 && HAS_AVX2 == True && OSG_HOST_KERNEL_VERSION >= 31000
request_cpus = 1
request_gpus = 1
request_memory = 8 GB
request_disk = 1 GB
# Container image to run the job in
container_image = /cvmfs/singularity.opensciencegrid.org/opensciencegrid/tensorflow-gpu:2.2-cuda-10.1/
# Executable is the program your job will run It's often useful
# to create a shell script to "wrap" your actual work.
Executable = job-wrapper.sh
Arguments =
# Inputs/outputs - in this case we just need our python code.
# If you leave out transfer_output_files, all generated files comes back
transfer_input_files = test.py
#transfer_output_files =
# Error and output are the error and output channels from your job
# that HTCondor returns from the remote host.
Error = $(Cluster).$(Process).error
Output = $(Cluster).$(Process).output
# The LOG file is where HTCondor places information about your
# job's status, success, and resource consumption.
Log = $(Cluster).log
# Send the job to Held state on failure.
#on_exit_hold = (ExitBySignal == True) || (ExitCode != 0)
# Periodically retry the jobs every 1 hour, up to a maximum of 5 retries.
#periodic_release = (NumJobStarts < 5) && ((CurrentTime - EnteredCurrentStatus) > 60*60)
# queue is the "start button" - it launches any jobs that have been
# specified thus far.
queue 1