While we have optimized particle functions such as accelerate, update and distribute, they are all stuck with the same datastructure (Population), which I believe is quite suboptimal. My guesstimate is that a 5-10 times performance boost on the particle part may be obtained by rewriting this from scratch.
The main performance loss, I believe, is from excessive pointer dereferencing. An illustration of the importance of using the right data structures was observed in the performance when we removed one layer of dereferencing by making Particle a memory-contiguous structure instead of a class with vectors. Likewise, Population has many small Cell objectss with vector's of particles and vector's of basis functions, etc. This is not very lightweight for such a performance-critical part of the code. If we could reduce all variables in Population to have only one level of dereferencing I believe it would run much faster. We could for instance have one long vector containing all particles, and mark the particles with the cell they belong to. We could store all cell-related values in one long vector per quantity to store. For example, rather than having a vector of Cells with a vector of vertex_coordinates, we just have a vector of vertex_coordinates directly in Population. For 3D, the vertex coordinates of Cell c is vertex_coordinates[4*c+i] where i is 0, 1, 2, and 3. I.e. contiguously stored instead of many dereferences. Inlined accessor functions can be put in Population declaration in .h file to hide the arithmetics involved to find the right index as this will probably be entirely optimized away by the compiler. On could then write pop.vertex_coordinates(c), etc.
While we have optimized particle functions such as
accelerate,updateanddistribute, they are all stuck with the same datastructure (Population), which I believe is quite suboptimal. My guesstimate is that a 5-10 times performance boost on the particle part may be obtained by rewriting this from scratch.The main performance loss, I believe, is from excessive pointer dereferencing. An illustration of the importance of using the right data structures was observed in the performance when we removed one layer of dereferencing by making
Particlea memory-contiguous structure instead of a class with vectors. Likewise,Populationhas many smallCellobjectss withvector's of particles andvector's of basis functions, etc. This is not very lightweight for such a performance-critical part of the code. If we could reduce all variables inPopulationto have only one level of dereferencing I believe it would run much faster. We could for instance have one longvectorcontaining all particles, and mark the particles with the cell they belong to. We could store all cell-related values in one long vector per quantity to store. For example, rather than having avectorofCells with avectorofvertex_coordinates, we just have avectorofvertex_coordinatesdirectly inPopulation. For 3D, the vertex coordinates ofCellc isvertex_coordinates[4*c+i]where i is 0, 1, 2, and 3. I.e. contiguously stored instead of many dereferences. Inlined accessor functions can be put inPopulationdeclaration in.hfile to hide the arithmetics involved to find the right index as this will probably be entirely optimized away by the compiler. On could then writepop.vertex_coordinates(c), etc.