Skip to content

API simplification #7

@raggleton

Description

@raggleton

Now that people have had a chance to use the classes, I'm thinking about simplifying their interface slightly.
Typically I find myself needing only 2 locations:

  • a directory on /storage (or similar) to put condor files, dag files, and logs
  • a directory on HDFS to put input/output files (already covered by hdfs_mirror_dir)

The JobSet class has many constructor arguments, especially for STDOUT/STDERR/LOG output (which themselves have separate args for directory and filename). The reason why I did it this way is to make it easy to use the same dir for all 3, but different filenames. (Of course, one could just use os.path.join() to avoid this!)

Would it therefore be worth me slimming down the interface? e.g. Having

JobSet(...
    storage_dir="/storage/abc1234/ntuple_31_10_16/",
    filename="cmsRun_091011.condor",
    logname="logs/cmsRun.$(cluster).$(process).log",
    ...)

to replace

JobSet(...
    filename='/storage/abc1234/ntuple_31_10_16/cmsRun_091011.condor',
    out_dir='/storage/abc1234/ntuple_31_10_16/logs', out_file='cmsRun.$(cluster).$(process).out',
    err_dir='/storage/abc1234/ntuple_31_10_16/logs', err_file='cmsRun.$(cluster).$(process).err',
    log_dir='/storage/abc1234/ntuple_31_10_16/logs', log_file='cmsRun.$(cluster).$(process).log',
...)

and inferring the stdout/err files from the logname field?

Basically, I don't want people to be put off by a multitude of args, that are often set to be the same or very similar. However I don't want to remove support for someone' particular workflow! (Suggestions for other simplifications welcome as well)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions