Err stats by JaeseungYeom · Pull Request #76 · llnl/AMS

JaeseungYeom · 2024-06-27T19:21:28Z

Taking advantage of idle rank due to load imbalance for computing extra physic model outputs such that those can be compared against surrogate outputs.
Currently, average and variation of the error of surrogate outputs of chosen validation points are reported per evaluation iteration.

koparasy · 2024-06-28T13:40:03Z

src/AMSlib/wf/workflow.hpp

        comm(MPI_COMM_NULL),
 #endif
-        ePolicy(AMSExecPolicy::AMS_UBALANCED)
+        ePolicy(AMSExecPolicy::AMS_UBALANCED),


@koparasy fix this before we land it to develop.

koparasy · 2024-06-28T16:03:22Z

@JaeseungYeom Thank you for the PR!

I am afraid this version has too much data movement across ranks. The predicate array although boolean can be fairly large in an actual large simulation. Gathering and Scattering them across the distributed system will be a bottleneck. How about this solution here:

unsigned physics_evaluations_per_rank[num_ranks];
// Root gathers only the number of physics evaluations per rank.
MPI_Gather(Root, num_of_my_physics_evaluations, physics_evaluations_per_rank); 

if (rank == 0) {
       total_evaluations = sum(physics_evaluations_per_rank);
       load_balance_slack = math::ceil(total_evaluations/word_size) * world_size - total_evaluations;
       int flipped_indexes[load_balance_slack];
       // randomly select which indexes we will flip.
       for (int i = 0; i < load_balance_slack; i++){
           flipped_indexes[i] = sample(0, total_evaluations -1);
       }
       flipped_indexes = sort(flipped_indexes);
       int additional_evaluations_per_rank[num_ranks];
       int j =0;
       int running_sum = 0;
       for ( i = 0; i < load_balance_slack; i++){
           int flipped_index = flipped_indexes[i];
           // Search among ranks in which index does this element fall in
           do {
             if (flipped_index > running_sum && flipped_index < running_sum +  physics_evaluations_per_rank[j]) {
                  additional_evaluations_per_rank[j] ++;
                  running_sum += physics_evaluations_per_rank[j];
                  break;
             }
             running_sum += physics_evaluations_per_rank[j];
           }
           while (j < num_ranks);
       }
       
       // Now here additional_evaluations_per_rank contains how many extra evaluations need to happen on every rank.
}
     MPI_Scatter(Root, additional_evaluations_per_rank, &my_evaluations);
     // Now here every rank has in my_evaluations the number of additional toggled predicated it needs to perform.
     
     // Every rank toggles and operates independently.
     ...

The algorithm probably is not correct but I tried to use to the extend possible meaningful variable names for you to understand. The concept though is to limit the number of data you send over the network. In this code I have 1 Gather and 1 Scatter. The size of the message increases linearly to the number of ranks. Instead you code has 1 Gather, 1 Scatter, 1 GatherV, 1 ScatterV. Gatherv and ScatterV may end up sending a lot of data! Which we cannot sustain in the inner loop.

Let me know if I can further help.

Thank you!

JaeseungYeom · 2024-07-11T16:34:58Z

@JaeseungYeom Thank you for the PR!

I am afraid this version has too much data movement across ranks. The predicate array although boolean can be fairly large in an actual large simulation. Gathering and Scattering them across the distributed system will be a bottleneck. How about this solution here:
unsigned physics_evaluations_per_rank[num_ranks];
// Root gathers only the number of physics evaluations per rank.
MPI_Gather(Root, num_of_my_physics_evaluations, physics_evaluations_per_rank); 

if (rank == 0) {
       total_evaluations = sum(physics_evaluations_per_rank);
       load_balance_slack = math::ceil(total_evaluations/word_size) * world_size - total_evaluations;
       int flipped_indexes[load_balance_slack];
       // randomly select which indexes we will flip.
       for (int i = 0; i < load_balance_slack; i++){
           flipped_indexes[i] = sample(0, total_evaluations -1);
       }
       flipped_indexes = sort(flipped_indexes);
       int additional_evaluations_per_rank[num_ranks];
       int j =0;
       int running_sum = 0;
       for ( i = 0; i < load_balance_slack; i++){
           int flipped_index = flipped_indexes[i];
           // Search among ranks in which index does this element fall in
           do {
             if (flipped_index > running_sum && flipped_index < running_sum +  physics_evaluations_per_rank[j]) {
                  additional_evaluations_per_rank[j] ++;
                  running_sum += physics_evaluations_per_rank[j];
                  break;
             }
             running_sum += physics_evaluations_per_rank[j];
           }
           while (j < num_ranks);
       }
       
       // Now here additional_evaluations_per_rank contains how many extra evaluations need to happen on every rank.
}
     MPI_Scatter(Root, additional_evaluations_per_rank, &my_evaluations);
     // Now here every rank has in my_evaluations the number of additional toggled predicated it needs to perform.
     
     // Every rank toggles and operates independently.
     ...
The algorithm probably is not correct but I tried to use to the extend possible meaningful variable names for you to understand. The concept though is to limit the number of data you send over the network. In this code I have 1 Gather and 1 Scatter. The size of the message increases linearly to the number of ranks. Instead you code has 1 Gather, 1 Scatter, 1 GatherV, 1 ScatterV. Gatherv and ScatterV may end up sending a lot of data! Which we cannot sustain in the inner loop.

Let me know if I can further help.

Thank you!

I got the idea. It is an excellent suggestion. I will implement something along the line of this.

…ysics model

… flips per rank.

JaeseungYeom force-pushed the err_stats branch 3 times, most recently from 572817a to b2dbb6f Compare June 27, 2024 21:26

koparasy reviewed Jun 28, 2024

View reviewed changes

JaeseungYeom force-pushed the err_stats branch from b2dbb6f to 91d1160 Compare August 8, 2024 17:27

JaeseungYeom force-pushed the err_stats branch from 91d1160 to 7e6584d Compare August 22, 2024 16:49

JaeseungYeom force-pushed the err_stats branch from 7eacdb6 to 5517f92 Compare September 19, 2024 11:45

JaeseungYeom added 5 commits September 19, 2024 13:32

initial version of error statistics computation

f241892

initial version of error statistics computation

96a6b88

update the test to make executer use multiple MPI ranks

f859bd1

Compute error statistics on the points originally evaluated by the ph…

5ae6a2c

…ysics model

locally choose predicates to flip. Root only determines the number of…

e519e62

… flips per rank.

JaeseungYeom force-pushed the err_stats branch from 5517f92 to 1fad494 Compare September 19, 2024 20:32

Fix the error regarding MPI usage before initialization

695ad26

JaeseungYeom force-pushed the err_stats branch from 1fad494 to 695ad26 Compare September 19, 2024 21:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Err stats#76

Err stats#76
JaeseungYeom wants to merge 6 commits intollnl:developfrom
JaeseungYeom:err_stats

JaeseungYeom commented Jun 27, 2024

Uh oh!

koparasy Jun 28, 2024

Uh oh!

koparasy commented Jun 28, 2024

Uh oh!

JaeseungYeom commented Jul 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JaeseungYeom commented Jun 27, 2024

Uh oh!

koparasy Jun 28, 2024

Choose a reason for hiding this comment

Uh oh!

koparasy commented Jun 28, 2024

Uh oh!

JaeseungYeom commented Jul 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants