`jobman` -- Run multiple jobs using multiple executors in parallel.

(This is very, very alpha software.)

The jobman command line tool is a simple job runner that you might typically use if you have a small collection of desktop machines that you want to run jobs on. Jobs and job runners are specified in two separate YAML files.

For example, you might have two desktop machines with two GPUs each. You can use this for running jobs using:

runners:
  thor0: ssh thor podman run -e NVIDIA_VISIBLE_DEVICES=0 registry:5000/pytorch-img {cmd}
  thor1: ssh thor podman run -e NVIDIA_VISIBLE_DEVICES=1 registry:5000/pytorch-img {cmd}
  odin0: ssh odin podman run -e NVIDIA_VISIBLE_DEVICES=0 registry:5000/pytorch-img {cmd}
  odin1: ssh odin podman run -e NVIDIA_VISIBLE_DEVICES=1 registry:5000/pytorch-img {cmd}

You actually probably want the script/command on stdin:

pre: echo running on the valhalla cluster
oninput: true
runners:
  thor0: ssh thor podman run -e NVIDIA_VISIBLE_DEVICES=0 registry:5000/pytorch-img /bin/bash
  thor1: ssh thor podman run -e NVIDIA_VISIBLE_DEVICES=1 registry:5000/pytorch-img /bin/bash
  odin0: ssh odin podman run -e NVIDIA_VISIBLE_DEVICES=0 registry:5000/pytorch-img /bin/bash
  odin1: ssh odin podman run -e NVIDIA_VISIBLE_DEVICES=1 registry:5000/pytorch-img /bin/bash

You specify jobs like this:

jobs:
  - python3 train.py --seed 0
  - python3 train.py --seed 1
  - python3 train.py --seed 2
  ...
  - python3 train.py --seed 100

Or like this:

pre: |
  echo startup: test jobs
  rm -rf /tmp/mylogs
logdir: /tmp/mylogs
template:
  command: echo {i}
  range: 10

Now you can run your jobs with:

$ jobman -j jobs.yaml -o logs

Jobman will execute your jobs on all the different runners and place the log files in the logs subdirectory.

You can also just specify jobs from the command line. This will run 100 jobs with seed 0..99:

$ jobman -T 'python3 train.py --seed={i}' -R 100 -o logs

Usage

Usage:
  jobman [OPTIONS]

Application Options:
  -v, --verbose       Verbose output
  -w, --wait=         Wait after each job completion.
  -r, --runners=      Runners file (default: env JOBMAN_RUNNERS or runners.yaml)
  -l, --line-buffer=  Line buffer size. (default: 1)
  -t, --line-timeout= Line timeout. (default: 1)
  -o, --log-dir=      Log directory.
  -j, --jobs=         Jobs file
  -T, --template=     Command template
  -R, --range=        Range used with command template.

Help Options:
  -h, --help          Show this help message

Other

multitail -q is useful for watching the log directory

TODO

override some command line options from jobs file
add pre: and post: section to both runners and jobs
add error handling options and retry options
allow non-shell, list-based invocation of commands
allow templating and multiline scripts inside jobs
command on stdin
jsonrun
job specs -- input, output, commandline, time limit, resource limit (structure and YAML)
one-per-runner jobs (for testing, preparing at startup)
matching job tags with runner tags (for resource requirements), e.g. #BIG

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
jobs.yaml		jobs.yaml
main.go		main.go
runners.yaml		runners.yaml
shell.nix		shell.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`jobman` -- Run multiple jobs using multiple executors in parallel.

Usage

Other

TODO

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

jobman -- Run multiple jobs using multiple executors in parallel.

Usage

Other

TODO

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`jobman` -- Run multiple jobs using multiple executors in parallel.

Packages