- Set Up and Installation
- Download the Data
- Develop your Submission
- Run your Submission
- Score your Submission
- Submit your Submission
To get started you will have to make a few decisions and install the repository along with its dependencies. Specifically:
- Decide if you would like to develop your submission in either PyTorch or JAX.
- Set up your workstation or VM. We recommend to use a setup similar to the
benchmarking hardware. The specs
on the benchmarking machines are:
- 4xA100 40GB GPUs
- 240 GB in RAM
- 2 TB in storage (for datasets).
- Install the
algoperfpackage and dependencies either in a Python virtual environment or use a Docker (recommended) or Singularity/Apptainer container.
Prerequisites:
- Python minimum requirement >= 3.11
- CUDA 12.1
- NVIDIA Driver version 535.104.05
To set up a virtual enviornment and install this repository
-
Create new environment, e.g. via
condaorvirtualenvsudo apt-get install python3-venv python3 -m venv env source env/bin/activate -
Clone this repository
git clone https://github.com/mlcommons/algorithmic-efficiency.git cd algorithmic-efficiency -
Run the following pip3 install commands based on your chosen framework to install
algoperfand its dependencies.For JAX:
pip3 install -e '.[pytorch_cpu]' pip3 install -e '.[jax_gpu]' -f 'https://storage.googleapis.com/jax-releases/jax_cuda_releases.html' pip3 install -e '.[full]'
For PyTorch
Note: the below command assumes you have CUDA 12.1 installed locally. This is the default in the provided Docker image. We recommend you match this CUDA version but if you decide to run with a different local CUDA version, please find the appropriate wheel url to pass to the
pip installcommand forpytorch.pip3 install -e '.[jax_cpu]' pip3 install -e '.[pytorch_gpu]' -f 'https://download.pytorch.org/whl/cu121' pip3 install -e '.[full]'
Per workload installations
You can also install the requirements for individual workloads, e.g. viapip3 install -e '.[librispeech]'or all workloads at once via
pip3 install -e '.[full]'We recommend using a Docker container to ensure a similar environment to our scoring and testing environments. Alternatively, a Singularity/Apptainer container can also be used (see instructions below).
Prerequisites:
- NVIDIA Driver version >= 535.104.05
- NVIDIA Container Toolkit so that the containers can locate the NVIDIA drivers and GPUs. See instructions in the NVIDIA Docker documentation.
-
Clone this repository
cd ~ && git clone https://github.com/mlcommons/algorithmic-efficiency.git
-
Build Docker image
cd algorithmic-efficiency/docker docker build -t <docker_image_name> . --build-arg framework=<framework>
The
frameworkflag can be eitherpytorch,jaxorboth. Specifying the framework will install the framework specific dependencies. Thedocker_image_nameis arbitrary.
To use the Docker container as an interactive virtual environment, you can run a
container mounted to your local data and code directories and execute the bash
program. This may be useful if you are in the process of developing a
submission.
-
Run detached Docker container. The
container_idwill be printed if the container is run successfully.docker run -t -d \ -v $HOME/data/:/data/ \ -v $HOME/experiment_runs/:/experiment_runs \ -v $HOME/experiment_runs/logs:/logs \ -v $HOME/algorithmic-efficiency:/algorithmic-efficiency \ --gpus all \ --ipc=host \ <docker_image_name> \ --keep_container_alive true
Note: You may have to use double quotes around
algorithmic-efficiency[path] in the mounting-vflag. If the above command fails try replacing the following line:-v $HOME/algorithmic-efficiency:/algorithmic-efficiency2 \with
-v $HOME"/algorithmic-efficiency:/algorithmic-efficiency" \
-
Open a bash terminal
docker exec -it <container_id> /bin/bash
Since many compute clusters don't allow the usage of Docker due to securtiy
concerns and instead encourage the use of
Singularity/Apptainer (formerly
Singularity, now called Apptainer), we also provide an Apptainer recipe (located
at docker/Singularity.def) that can be used to build an image by running
singularity build --fakeroot <singularity_image_name>.sif Singularity.defNote that this can take several minutes. Then, to start a shell session with GPU
support (by using the --nv flag), we can run
singularity shell --bind $HOME/data:/data,$HOME/experiment_runs:/experiment_runs \
--nv <singularity_image_name>.sifNote the --bind flag which, similarly to Docker, allows to bind specific paths
on the host system and the container, as explained in the
Singularity User Guide.
Also note that we generated Singularity.def automatically from the
Dockerfile using spython,
as follows:
pip3 install spython
cd algorithmic-efficiency/docker
python scripts/singularity_converter.py -i Dockerfile -o Singularity.defUsers that wish to customize their images are invited to check and modify the
Singularity.def recipe and the singularity_converter.py script.
The workloads in this benchmark use 6 different datasets across 9 workloads. You may choose to download some or all of the datasets as you are developing your submission, but your submission will be scored across all 9 workloads. For instructions on obtaining and setting up the datasets see dataset/README.
To develop a submission you will write a Python module containing your training algorithm. Your training algorithm must implement a set of predefined API methods for the initialization and update steps.
Make a submissions subdirectory to store your submission modules e.g.
algorithmic-effiency/submissions/my_submissions.
You can find examples of submission modules under
algorithmic-efficiency/algorithms.
A submission for the external ruleset will consist of a submission module and a
tuning search space definition.
-
Copy the template submission module
algorithms/template/submission.pyinto your submissions directory e.g. inalgorithmic-efficiency/my_submissions. -
Implement at least the methods in the template submission module. Feel free to use helper functions and/or modules as you see fit. Make sure you adhere to to the competition rules. Check out the guidelines for allowed submissions, disallowed submissions and pay special attention to the software dependencies rule.
-
Add a tuning configuration e.g.
tuning_search_space.jsonfile to your submission directory. For the tuning search space you can either:-
Define the set of feasible points by defining a value for "feasible_points" for the hyperparameters:
{ "learning_rate": { "feasible_points": 0.999 }, }For a complete example see tuning_search_space.json.
-
Define a range of values for quasirandom sampling by specifing a
min,maxandscalingkeys for the hyperparameter:{ "weight_decay": { "min": 5e-3, "max": 1.0, "scaling": "log", } }For a complete example see tuning_search_space.json.
-
From your virtual environment or interactively running Docker container run your
submission with submission_runner.py:
JAX: to score your submission on a workload, from the algorithmic-efficency directory run:
python3 submission_runner.py \
--framework=jax \
--workload=mnist \
--experiment_dir=<path_to_experiment_dir>\
--experiment_name=<experiment_name> \
--submission_path=submissions/my_submissions/submission.py \
--tuning_search_space=<path_to_tuning_search_space>PyTorch: to score your submission on a workload, from the algorithmic-efficency directory run:
python3 submission_runner.py \
--framework=pytorch \
--workload=<workload> \
--experiment_dir=<path_to_experiment_dir> \
--experiment_name=<experiment_name> \
--submission_path=<path_to_submission_module> \
--tuning_search_space=<path_to_tuning_search_space>We recommend using PyTorch's
Distributed Data Parallel (DDP)
when using multiple GPUs on a single node. You can initialize ddp with torchrun.
For example, on single host with 4 GPUs simply replace python3 in the above
command by:
torchrun --redirects 1:0,2:0,3:0 --standalone --nnodes=1 --nproc_per_node=N_GPUSwhere N_GPUS is the number of available GPUs on the node.
So the complete command is:
torchrun --redirects 1:0,2:0,3:0 \
--standalone \
--nnodes=1 \
--nproc_per_node=N_GPUS \
submission_runner.py \
--framework=pytorch \
--workload=<workload> \
--experiment_dir=<path_to_experiment_dir> \
--experiment_name=<experiment_name> \
--submission_path=<path_to_submission_module> \
--tuning_search_space=<path_to_tuning_search_space>The container entrypoint script provides the following flags:
--datasetdataset: can be 'imagenet', 'fastmri', 'librispeech', 'criteo1tb', 'wmt', 'finewebedu', or 'ogbg'. Setting this flag will download data if~/data/<dataset>does not exist on the host machine. Required for running a submission.--frameworkframework: can be either 'pytorch' or 'jax'. If you just want to download data, this flag is required for-d imagenetsince we have two versions of data for imagenet. This flag is also required for running a submission.--submission_pathsubmission_path: path to submission file on container filesystem. If this flag is set, the container will run a submission, so it is required for running a submission.--tuning_search_spacetuning_search_space: path to file containing tuning search space on container filesystem. Required for running a submission.--experiment_nameexperiment_name: name of experiment. Required for running a submission.--workloadworkload: can be 'imagenet_resnet', 'imagenet_jax', 'librispeech_deepspeech', 'librispeech_conformer', 'ogbg', 'wmt', 'fastmri', 'finewebedu_lm', or 'criteo1tb'. Required for running a submission.--max_global_stepsmax_global_steps: maximum number of steps to run the workload for. Optional.--keep_container_alive: can be true or false. Iftruethe container will not be killed automatically. This is useful for developing or debugging.
To run the docker container that will run the submission runner run:
docker run -t -d \
-v $HOME/data/:/data/ \
-v $HOME/experiment_runs/:/experiment_runs \
-v $HOME/experiment_runs/logs:/logs \
--gpus all \
--ipc=host \
<docker_image_name> \
--dataset <dataset> \
--framework <framework> \
--submission_path <submission_path> \
--tuning_search_space <tuning_search_space> \
--experiment_name <experiment_name> \
--workload <workload> \
--keep_container_alive <keep_container_alive>This will print the container ID to the terminal.
To find the container IDs of running containers
docker psTo see output of the entrypoint script
docker logs <container_id>To enter a bash session in the container
docker exec -it <container_id> /bin/bashTo score your submission we will score over all workloads, studies, and trials as described in the rules. In other words, the total number of runs expected for official scoring is:
- for external tuning ruleset: 135 = 9 (workloads) x 3 (studies) x 5 (trials)
- for self-tuning ruleset: 27 = 9 (workloads) x 3 (studies)
To run a number of studies and trials over all workload using Docker containers for each run:
python scoring/run_workloads.py \
--framework <framework> \
--experiment_name <experiment_name> \
--docker_image_url <docker_image_url> \
--submission_path <sumbission_path> \
--tuning_search_space <submission_path> \
--held_out_workloads_config_path held_out_workloads_example.json \
--num_studies <num_studies>
--seed <rng_seed>Note that to run the above script you will need at least the jax_cpu and
pytorch_cpu installations of the algorithmic-efficiency package.
During submission development, it might be useful to do faster, approximate
scoring (e.g. without 3 different studies or when some trials are missing) so
the scoring scripts allow some flexibility. To simulate official scoring, pass
the --strict=True flag in score_submission.py. To get the raw scores and
performance profiles of group of submissions or single submission:
python score_submissions.py --submission_directory <directory_with_submissions> --output_dir <output_dir> --compute_performance_profilesWe provide the scores and performance profiles for the paper baseline algorithms in the "Baseline Results" section in Benchmarking Neural Network Training Algorithms.
To submit your submission, please create a PR on the submission repository. You can find more details in the submission repositories How to Submit section. The working group will review your PR and select the most promising submissions for scoring.
Good Luck!