-
Notifications
You must be signed in to change notification settings - Fork 64
Plot losses against elapsed training time via --x-axis flag #2501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
florianscheidl
wants to merge
88
commits into
ecmwf:develop
Choose a base branch
from
florianscheidl:flo/plot-training-time-axis
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
88 commits
Select commit
Hold shift + click to select a range
644273a
feat: add timing metrics (startup, training, overall)
florianscheidl 92120b8
docs: add agent structure with skills, tasks, and docs
florianscheidl df69dcb
docs: add skills review cycle for periodic compactification
florianscheidl 7f0648f
configs
florianscheidl 8fe45a0
Merge branch 'feature/timing-metrics' into ekfs/scaling-plots-20260417
florianscheidl dd55fb0
Remove hermes tool tracking for now
florianscheidl 09b6e82
Try duration metrics
florianscheidl da3c29b
Update metrics, store after each mini-epoch
florianscheidl fc9a111
Refactor configs/streams
florianscheidl cfc4c62
Extract scaling data
florianscheidl 82b503a
Script to generate scaling plots
florianscheidl 70053b1
Script update
florianscheidl 0c2df97
Repeat data in mini epoch
florianscheidl 2c79d28
corrected time window length
florianscheidl 6374986
Merge branch 'ekfs/scaling-plots-20260417' of github.com:florianschei…
florianscheidl b5d70f6
Lower to 512 samples per mini epoch
florianscheidl f46828c
Updated extraction script
florianscheidl 89ac519
Merge branch 'ekfs/scaling-plots-20260417' of github.com:florianschei…
florianscheidl 7cad6b5
Log time more often
florianscheidl 30ac102
Fix training start scope
florianscheidl 5e7f63e
Minimal validation
florianscheidl 2be95c6
Increase samples_per_mini_epoch to 1024
florianscheidl 93b203b
Final training duration and terminal/metric logging
florianscheidl 2b708e3
log metrics after mini-epoch
florianscheidl 0d8407d
Log metrics after mini-epoch, change schema
florianscheidl 422fc60
MEtric typo
florianscheidl f63cba9
Logging refactor
florianscheidl b596c14
Update extraction script
florianscheidl 42ba646
NNode extraction
florianscheidl c9fa64d
Logs path
florianscheidl ccfbc64
Wait until all training complete and wait with validation until logs …
florianscheidl c0f96b7
Log seconds rather than hours
florianscheidl 701eb00
Merge branch 'ekfs/scaling-plots-20260417' of github.com:florianschei…
florianscheidl e6475e9
Measure dataset advancement time
florianscheidl 6fd001f
LR scheduler lower bounds
florianscheidl 313cec6
At least two warmup steps
florianscheidl aa4d399
Len per rank at least 1 to avoid zero division error
florianscheidl 7956c52
Write csv for easier viewing
florianscheidl cf659e1
Extraction and plotting
florianscheidl 177df79
Remove parent dir creation
florianscheidl 8a4bc56
more detailed extraction script
florianscheidl bca6d3d
Remove overall time logging
florianscheidl 9436811
Cleanup trainer
florianscheidl 21c1575
Metrics extraction and plot generation scripts
florianscheidl e67616a
Add efficiency factor in plot
florianscheidl b1e4ea4
RM checkpoint and log metrics at last iteration
florianscheidl c89fe20
Detailed metrics
florianscheidl c14b749
Merge branch 'ekfs/scaling-plots-20260417' of github.com:florianschei…
florianscheidl b0bc6c2
Remove barrier and extra logging on last batch
florianscheidl 133ee4c
trainer code cleanup
florianscheidl ec665da
Lower bound beta2 in adam
florianscheidl 2088311
update script for scaling plots, loss as separate entry point
florianscheidl 5bd88d9
specify nodes in scaling data script
florianscheidl 7e1ae1c
Update extract scaling data
florianscheidl b42432c
Add pyarrow
florianscheidl 6d5683b
Update script for scaling plots
florianscheidl 0396290
Update to generating scaling plots
florianscheidl 86093d7
Merge branch 'develop' into ekfs/scaling-plots-20260417
florianscheidl 6800262
Move scaling scripts to package
florianscheidl c5af276
init refactor
florianscheidl e760c13
Setup and linting
florianscheidl 556106e
Updated plot generation script
florianscheidl b93813d
Update readme
florianscheidl b2fe866
Fewer diffs
florianscheidl c574514
no gitignore changes
florianscheidl dad5462
Refactor logging and move time for mini epoch logging outside loop
florianscheidl 4f11519
Formatting and style fixes
florianscheidl b02b38f
Update config
florianscheidl 55d8219
Avoid duplicate metrics
florianscheidl 904713d
Fix lint issues
florianscheidl 9ecd544
t_training in __init__
florianscheidl 9f02dc1
Renamed metric
florianscheidl bfd5424
Merge branch 'develop' into ekfs/scaling-plots-20260417
clessig 0785e3b
mv performance package
florianscheidl e982776
Plot losses against elapsed training time via --x-axis flag
florianscheidl 367454d
Fewer changes
florianscheidl d0f851c
rm configs
florianscheidl 2f25ee8
Remove startup time
florianscheidl 198b542
Remove startup time
florianscheidl 6f7ff39
Merge branch 'develop' into flo/plot-training-time-axis
florianscheidl 0de8660
Formatting and removed time per epoch
florianscheidl 11bcd2b
Merge branch 'flo/plot-training-time-axis' into ekfs/scaling-plots-20…
florianscheidl 39f8075
Undo pyproject change
florianscheidl 41b1fbf
Merge branch 'develop' into ekfs/scaling-plots-20260417
florianscheidl 61ad9cf
ploting changes wip
florianscheidl e099f84
Undo pyproject changes
florianscheidl a45c255
Merge branch 'ekfs/scaling-plots-20260417' of github.com:florianschei…
florianscheidl 68c0428
Merge branch 'ekfs/scaling-plots-20260417' into flo/plot-training-tim…
florianscheidl File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For plot_train, timing should start here. This would also avoid that run_train is modified.