Performance optimization for time stepping in cylindrical coordinates for m ≠ 0 by oskooi · Pull Request #3158 · NanoComp/meep

oskooi · 2026-03-01T00:13:07Z

Context

The FDTD time-stepping hot path calls step_db() and update_eh() twice per timestep each (once for B/H, once for D/E). The inner loop kernels in step_generic.cpp / step_generic_stride1.cpp are already well-optimized with OpenMP (PLOOP_OVER_IVECS), SIMD vectorization (PS1LOOP), restrict pointers, and specialized code paths for each PML/conductivity combination.

However, the cylindrical-coordinate (Dcyl) m≠0 code in step_db.cpp (lines 178-293) is a major performance bottleneck:

It uses serial LOOP_OVER_VOL_OWNED0 instead of parallel PLOOP variants.
Every grid point constructs an ivec object via IVEC_LOOP_ILOC, calls .r(), and performs a floating-point division – yet the r-coordinate only changes with the outer (R) loop.
The KSTRIDE_DEF PML-index setup is redundantly placed inside the loop body instead of outside.

These issues make cylindrical simulations with m≠0 significantly slower than they need to be.

Optimizations

1. Precompute reciprocal-r table for cylindrical m≠0 loops (HIGH IMPACT)

File: src/step_db.cpp, lines 178-293

Before the 8 special-case loops, precompute a 1D array of the_m / r_int values indexed by the R loop dimension. Inside the loop, replace:

IVEC_LOOP_ILOC(gv, here);
realnum rinv = the_m / here.r();

with a direct lookup using the R loop index. This eliminates per-point ivec construction, .r() method dispatch, and floating-point division. The R loop index maps to loop_i1 for Dcyl (R is outermost, Z is innermost/stride-1).

Implementation: after computing the_m (line 200) and is (line 193), determine the R dimension's loop parameters from gv.yucky_direction() and precompute rinv_arr[ir] = the_m / (is.in_direction(R) + 2 * ir) for ir = 0..nr-1.

2. Parallelize cylindrical m≠0 loops (HIGH IMPACT)

File: src/step_db.cpp, lines 203-291

Replace all 8 instances of serial LOOP_OVER_VOL_OWNED0(gv, cc, i) with PLOOP_OVER_VOL_OWNED(gv, cc, i) (or PS1LOOP_OVER_VOL_OWNED0 when stride-1). This enables OpenMP multi-threading for the cylindrical m≠0 field updates, which are currently the only serial loops on the hot path.

With the precomputed rinv table (optimization 1), the loop body becomes pure array arithmetic, ideal for parallization and SIMD.

3. Host KSTRIDE_DEF outside loops (MEDIUM IMPACT)

File: src/step_db.cpp, lines 203-291

Move KSTRIDE_DEF(dsig, k, is, gv) and KSTRIDE_DEF(dsigu, ku, is, gv) from inside the loop body to before the loop, matching the pattern used in step_generic.cpp. Currently these define const variables every iteration; hoisting them:

Reduces redundant computation (even if compiler may hoist them for serial loops)
Is required for correct OpenMP parallel behavior (avoids per-thread redundant setup)
Matches the established pattern from the well-optimized step_generic.cpp

4. Add SIMD hint to f_rderiv_int inner loop (MEDIUM IMPACT)

File: src/step_db.cpp, lines 109-117

The Z inner loop in the f_rderiv_int computation has no cross-iteration dependencies within a given R row. Add IVDEP pragma before the inner loop to enable auto-vectorization.

5. Use memset for f_rderiv_int initialization (LOW IMPACT)

Replace explicit zero-initialization loop with memset.

… for m != 0

oskooi · 2026-03-01T00:34:43Z

src/step_db.cpp

+        const int nr_loop = (gv.big_corner().in_direction(R) - is_r) / 2 + 1;
+        realnum *rinv_arr = new realnum[nr_loop];
+        for (int ir = 0; ir < nr_loop; ++ir)
+          rinv_arr[ir] = the_m / (is_r + 2 * ir);


A suggested rewrite to guard against division by zero at r = 0:

for (int ir = 0; ir < nr_loop; ++ir) { int r_idx = is_r + 2 * ir; rinv_arr[ir] = (r_idx != 0) ? the_m / r_idx : 0; }

I would rather leave it as-is.

We only divide by zero here in cases where we aren't actually going to use rinv_arr[0] (for the Z component, we have special update rules at $r=0$), so it is actually a good thing that we are storing Inf in that case — if we ever accidentally use it, we will immediately notice because the infinities will propagate. Whereas storing zero instead would cause such a bug to be silently hidden.

However, it might be nice to put a comment in the code:

// intentionally store Inf for rinv_arr[0] if !is_r, since we should never use that value

codecov-commenter · 2026-03-01T00:36:28Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.90%. Comparing base (f29a8c7) to head (8fb234a).
⚠️ Report is 111 commits behind head on master.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3158      +/-   ##
==========================================
+ Coverage   73.81%   73.90%   +0.09%     
==========================================
  Files          18       18              
  Lines        5423     5454      +31     
==========================================
+ Hits         4003     4031      +28     
- Misses       1420     1423       +3

see 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

stevengj · 2026-03-01T01:54:42Z

Have you benchmarked this change? What is the performance improvement?

oskooi · 2026-03-01T04:22:35Z

The changes in this PR do seem to provide a modest speedup. I ran an experiment on my local machine (Intel Xeon i7-7700K, serial Meep built from source: a474edd) for OMP_NUM_THREADS of 1, 2, and 3 using the example of Tutorial/Nonaxisymmetric Dipole Sources involving a ring-current source positioned at $r = 3.5$ μm with $m = 6$. The script is here.

Benchmarking results are shown in the table below.

src/step_db.cpp

oskooi · 2026-03-07T19:05:40Z

There is a decent speedup with the latest changes (8fb234a) using the same benchmarking test:

stevengj · 2026-03-07T19:09:22Z

What is the speedup compared to the current master branch?

oskooi · 2026-03-08T00:38:52Z

Here are the benchmarking results for the master branch (a53a3ed). This PR does indeed provide a significant improvement in runtime even for a single thread.

Note: benchmarking results were obtained using Meep compiled using single-precision floating point.

performance optimization for time stepping in cylindrical coordinates…

757ee44

… for m != 0

oskooi added the enhancement label Mar 1, 2026

oskooi commented Mar 1, 2026

View reviewed changes

stevengj reviewed Mar 6, 2026

View reviewed changes

src/step_db.cpp Show resolved Hide resolved

replace rinv as precomputed array with inline calculation

8fb234a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance optimization for time stepping in cylindrical coordinates for m ≠ 0#3158

Performance optimization for time stepping in cylindrical coordinates for m ≠ 0#3158
oskooi wants to merge 2 commits intoNanoComp:masterfrom
oskooi:cyl_timestep_enhancements

oskooi commented Mar 1, 2026

Uh oh!

oskooi Mar 1, 2026

Uh oh!

stevengj Mar 6, 2026

Uh oh!

codecov-commenter commented Mar 1, 2026 •

edited

Loading

Uh oh!

stevengj commented Mar 1, 2026

Uh oh!

oskooi commented Mar 1, 2026

Uh oh!

Uh oh!

oskooi commented Mar 7, 2026

Uh oh!

stevengj commented Mar 7, 2026 •

edited

Loading

Uh oh!

oskooi commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

oskooi commented Mar 1, 2026

Uh oh!

oskooi Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

stevengj Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

stevengj commented Mar 1, 2026

Uh oh!

oskooi commented Mar 1, 2026

Uh oh!

Uh oh!

oskooi commented Mar 7, 2026

Uh oh!

stevengj commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oskooi commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Mar 1, 2026 •

edited

Loading

stevengj commented Mar 7, 2026 •

edited

Loading