Enhance CPU simulator logging and throughput metrics by yuxin00j · Pull Request #937 · fsspec/gcsfs

yuxin00j · 2026-06-26T03:04:25Z

Description

This PR adds logging enhancements and throughput calculation fixes to the macrobenchmark CPU simulator in gcsfs.

Key Changes

Checkpoint Overhead Exclusion: Checkpointing duration (both saving and removing) is now tracked and deducted from step_time to prevent skewed throughput numbers during checkpoint steps.
Accurate Data Loading Metrics: The step timer is initialized at on_train_epoch_start to ensure the first batch captures the initial data loading delay.
Detailed Throughput Logging: Emits both local_throughput and global_throughput per optimizer step.
Targeted Profiling Hooks: Adds profiler hooks to isolate and measure FitLoop.setup_data and _PrefetchDataFetcher.__iter__ (worker spawn times).
Dataset Load Timing: Injects a timer around datasets.load_dataset to measure HF dataset preparation time.

gemini-code-assist

Code Review

This pull request introduces profiling hooks and improves throughput tracking in the Llama 3.1 CPU simulation script by excluding checkpointing overhead from step time calculations and adding local/global throughput metrics. The review feedback highlights critical issues with this implementation: a potential division-by-zero error if the calculated step time is zero or negative, and inaccurate step time tracking on non-zero ranks in DDP environments because checkpoint saving and deletion are primarily executed on rank 0. To address these, the reviewer suggests guarding against non-positive step times and broadcasting checkpoint durations from rank 0 to all other ranks.

codecov · 2026-06-26T03:10:20Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.77%. Comparing base (381c33e) to head (817e3d2).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #937   +/-   ##
=======================================
  Coverage   89.77%   89.77%           
=======================================
  Files          16       16           
  Lines        3569     3569           
=======================================
  Hits         3204     3204           
  Misses        365      365

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Update cpu sim with logging enhancements and throughput fixes

e1b372e

gemini-code-assist Bot reviewed Jun 26, 2026

View reviewed changes

Fix linting errors in cpu sim

e4aa27b

yuxin00j changed the title ~~Update cpu sim with logging enhancements and throughput fixes~~ Enhance CPU simulator logging and throughput metrics Jun 26, 2026

yuxin00j marked this pull request as ready for review June 26, 2026 03:34

yuxin00j requested a review from zhixiangli June 26, 2026 03:34

zhixiangli reviewed Jun 26, 2026

View reviewed changes

Comment thread ...s/perf/macrobenchmarks/workloads/hf-pytorch-lightning-cpu/helm_chart/llama_3_1_8b_cpu_sim.py

Comment thread ...s/perf/macrobenchmarks/workloads/hf-pytorch-lightning-cpu/helm_chart/llama_3_1_8b_cpu_sim.py

Add upstream PyTorch Lightning PR link to TODO comment

817e3d2

zhixiangli merged commit 3bd3383 into fsspec:main Jun 26, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhance CPU simulator logging and throughput metrics#937

Enhance CPU simulator logging and throughput metrics#937
zhixiangli merged 3 commits into
fsspec:mainfrom
yuxin00j:apply-sim-logging

yuxin00j commented Jun 26, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

yuxin00j commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key Changes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yuxin00j commented Jun 26, 2026 •

edited

Loading

codecov Bot commented Jun 26, 2026 •

edited

Loading