Skip to content

[#1804] Re-implement pytorch profiler pr#2504

Open
florianscheidl wants to merge 64 commits into
ecmwf:developfrom
florianscheidl:flo/revisit-profiler-pr
Open

[#1804] Re-implement pytorch profiler pr#2504
florianscheidl wants to merge 64 commits into
ecmwf:developfrom
florianscheidl:flo/revisit-profiler-pr

Conversation

@florianscheidl

@florianscheidl florianscheidl commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Description

After #1804 was reverted, we slightly rewrote the profiler PR, which should avoid the issue with the profiling config.

Important: wait for #2310 (review) to be merged first.

Issue Number

Fixes #1804

Is this PR a draft? Mark it as draft.

Checklist before asking for review

  • I have performed a self-review of my code
  • My changes comply with basic sanity checks:
    • I have fixed formatting issues with ./scripts/actions.sh lint
    • I have run unit tests with ./scripts/actions.sh unit-test
    • I have documented my code and I have updated the docstrings.
    • I have added unit tests, if relevant
  • I have tried my changes with data and code:
    • I have run the integration tests with ./scripts/actions.sh integration-test
    • (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
    • (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
  • I have informed and aligned with people impacted by my change:
    • for config changes: the MatterMost channels and/or a design doc
    • for changes of dependencies: the MatterMost software development channel

@github-actions github-actions Bot added infra Issues related to infrastructure performance Work related to performance improvements labels Jun 15, 2026

@grassesi grassesi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not run it yet, but I am pretty sure there are still some configuration issues. Please see the comments for more details. I am also happy to discuss the implementation details I suggested.

Comment thread src/weathergen/run_train.py
Comment thread packages/common/src/weathergen/common/config.py Outdated
Comment thread packages/common/src/weathergen/common/config.py Outdated
Comment thread src/weathergen/train/trainer.py Outdated
Comment thread src/weathergen/train/trainer.py Outdated
@github-project-automation github-project-automation Bot moved this to In Progress in WeatherGen-dev Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infra Issues related to infrastructure performance Work related to performance improvements

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

Implement torch.profiler

3 participants