Drastically improve efficiency of topography generation in tripole region#49
Merged
micaeljtoliveira merged 1 commit intoCOSIMA:mainfrom Apr 2, 2025
Merged
Conversation
In the tripole, we use the make_topo_gen routine. This was allocating two arrays with 20,000,000 elements to hold topography for the patch. The target grid is divided into blocks of 100x25 points, and for each point in these blocks, for ever block, both of the large arrays were being completely zeroed. This meant that performance was almost completely dominated by memset. Isolating to just 10 of these blocks (but including NetCDF input and output), I saw 98.2% of the CPU time spent in memset, taking over 2 minutes to process. With this patch, it takes about 5 seconds to process the blocks, where most of that time is spent in KD tree generation and NetCDF input/output. An entire invocation of gen_topo from GEBCO to a 1/10 full-globe grid went down from around 2 hours to 6 minutes.
Contributor
|
Awesome, thanks @angus-g! It looks like this will produce bitwise identical results to the original, right? |
Contributor
Author
|
It should do, yes. |
Contributor
|
@angus-g Great work! I suspected the code could be made faster, but never imagined it would be something this simple. |
micaeljtoliveira
approved these changes
Apr 2, 2025
Contributor
|
@angus-g Regarding the use of the Fortran stdlib, I think this would be fine. The more projects use stdlib the better ;) So feel free to create a PR with those changes. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In the tripole, we use the make_topo_gen routine. This was allocating two arrays with 20,000,000 elements to hold topography for the patch. The target grid is divided into blocks of 100x25 points, and for each point in these blocks, for every block, both of the large arrays were being completely zeroed. This meant that performance was almost completely dominated by memset.
Isolating to just 10 of these blocks (but including NetCDF input and output), I saw 98.2% of the CPU time spent in memset, taking over 2 minutes to process. With this patch, it takes about 5 seconds to process the same blocks, where most of that time is spent in KD tree generation and NetCDF input/output. An entire invocation of gen_topo from GEBCO to a 1/10 full-globe grid went down from around 2 hours to 6 minutes.
The other performance win would be to use a selection (e.g. from fortran stdlib) rather than sorting algorithm for the lines like
I haven't introduced that because of the extra dependency, but it shaves the runtime down to 4:45 with the same conditions as the tests above (an extra minute or so). All told, with compiler optimisations, this PR, and using selection instead of sort, the 1/10 topography generation is down from 2 hours to 200 seconds.