Skip to content

A100 scoring changes#916

Open
aahladc wants to merge 7 commits intodevfrom
a100_scoring_changes
Open

A100 scoring changes#916
aahladc wants to merge 7 commits intodevfrom
a100_scoring_changes

Conversation

@aahladc
Copy link

@aahladc aahladc commented Mar 19, 2026

Updating scripts to work well with the new round of submissions (which run on A100) on slurm.

priyakasimbeg and others added 7 commits March 9, 2026 13:13
1. Add finewebedu workload.
2. Update num_trials and num_studies to be flag defined (since they can vary between self and external tuning rulesets)
3. Have every run use a different seed for more variability
1. Ensure any variable can be passed in via flags. Folks shouldn't have to edit the file and hardcode variables for any reason.
2. Pass max global steps via a flag.
3. Update some default values for the new submission (repo/image/config file/logs bucket)
The script is meant to be used only in the slurm cluster. It forces a specific directory structure, and checks for it right in the beginning. If the dir structure is not as expected, it throws an error and explains the structure it expects.

It also includes a dry run flag which runs the job for 10 steps, and includes a command on how to use it at the top of the file.
Also update the readme file to explain this script.
1. Update base workloads to 9 (with finewebedu).
2. Remove all logic related to test targets, since they are no longer used. Work only with validation targets.
3. Fix step time computation.
@aahladc aahladc requested a review from a team as a code owner March 19, 2026 20:50
@github-actions
Copy link

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants