Enhance CPU simulator logging and throughput metrics#937
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces profiling hooks and improves throughput tracking in the Llama 3.1 CPU simulation script by excluding checkpointing overhead from step time calculations and adding local/global throughput metrics. The review feedback highlights critical issues with this implementation: a potential division-by-zero error if the calculated step time is zero or negative, and inaccurate step time tracking on non-zero ranks in DDP environments because checkpoint saving and deletion are primarily executed on rank 0. To address these, the reviewer suggests guarding against non-positive step times and broadcasting checkpoint durations from rank 0 to all other ranks.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #937 +/- ##
=======================================
Coverage 89.77% 89.77%
=======================================
Files 16 16
Lines 3569 3569
=======================================
Hits 3204 3204
Misses 365 365 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Description
This PR adds logging enhancements and throughput calculation fixes to the macrobenchmark CPU simulator in
gcsfs.Key Changes
step_timeto prevent skewed throughput numbers during checkpoint steps.on_train_epoch_startto ensure the first batch captures the initial data loading delay.local_throughputandglobal_throughputper optimizer step.FitLoop.setup_dataand_PrefetchDataFetcher.__iter__(worker spawn times).datasets.load_datasetto measure HF dataset preparation time.