Iterative streaming: drive ptycho's ML_grad, match holoscan-framework (#65) by gwbischof · Pull Request #66 · NSLS2/holoptycho

Garrett Bischof (gwbischof) · 2026-06-15T00:32:09Z

What

Fixes the iterative streaming reconstruction (issue #65). The engine was a hand-reimplementation of the per-batch recon math that only did DM — the wrong algorithm — which is why the live recon was broken. This makes StreamingPtychoRecon a thin streaming wrapper that drives ptycho master's verbatim ML_grad code, and brings the streaming path back in line with the known-good holoscan-framework beamline branch.

Depends on NSLS2/ptycho#87 (ML_grad algorithm, additive) + NSLS2/ptycho#88 (accumulate_*_mode_local kernels), both merged. pixi.lock repins ptycho master.

Changes

Drive ptycho's ML_grad, delete the DM reimplementation. _update_psi → recon_gpu_launch(recon_ml_grad_trans_gpu_single); the hand-written DM batch + DM kernels/buffers are gone.
Inherit all the math from ptycho — zero duplication. recon_ml_grad, recon_gpu_launch, calc_prb_obj, forward/back_prop, and accumulate_obj/prb + gather_obj/prb are bound from ptycho_trans (verbatim, NSLS2/ptycho#87). The paraphrased _accumulate_*/_gather_* are deleted.
Correct the kernel pairing. ptycho master's accumulate_*_mode kernels index product_d globally (full array); HXN's accumulate_*_gpu_single pass a per-batch slice and need local-z indexing. gpu_setup binds the *_local kernels (#88) so the method+kernel pair is faithful to HXN. (Dedup tracked in NSLS2/ptycho#89.)
Remove init_product_range. Vestigial DM dual-state seeding; ML_grad recomputes product_d every batch, so it's a no-op — and holoscan-framework has no equivalent.
Dashboard: transparent no-data background. No-data object pixels (no probe coverage) are NaN'd in the live snapshot so the dashboard renders them transparent instead of colormapping the seed value. Scoped to the live copy; saved reconstructions keep their finite background. write_live's guard updated to allow background NaN while still catching divergence.

Method-by-method review vs holoscan-framework

Every streaming-state method was compared against the HXN branch holoscan-framework ran. Result: all functionally equivalent or a required-for-main deviation — the only genuine divergences were the kernel mispairing and init_product_range, both fixed above. Required deviations: single-GPU (no MPI/NCCL), in-process numpy snapshots, probe-from-prb_path-file, clear_region_enabled gate.

Validation

✅ Offline engine test on dumped scan-411993 inputs: reconstructs the clean striped specimen (ePIE-matching), all-finite.
✅ Live replay (scan 411993): object resolves correctly, matches holoscan-framework; dashboard background transparent.
✅ tests/ green (2 pre-existing tiled-404 smoke failures aside).

… the fetch cache

…lidation Standalone ML (maximum-likelihood gradient) and ePIE on dumped/cached inputs, used to confirm the data/geometry are correct independent of the engine. Dev tooling for validating the #65 library-engine migration.

) The streaming engine was a hand-reimplementation of the per-batch recon math that only implemented DM — the wrong algorithm. The HXN beamline runs ML_grad, which is why pipeline-DM matched the offline-DM (both blobby) while an offline ePIE reconstructed the same inputs cleanly. Fix: StreamingPtychoRecon now drives ptycho master's real ML_grad methods (added additively in NSLS2/ptycho#87) instead of reimplementing them: - gpu_setup binds recon_gpu_launch / recon_ml_grad_trans_gpu_single / calc_prb_obj_gpu / forward_prop_gpu / back_prop_gpu onto the instance via getattr(ptycho_trans, name).__get__(self) — verbatim beamline code, no paraphrase. The import is deferred to gpu_setup so the pure-Python unit tests still construct a recon without a GPU / ptycho / matplotlib. - Binds the ML_grad kernels (get_amp, ml_calc_grad, multiply_with_mode) and allocates tmp1_d/tmp2_d; drops the DM-only kernels/buffers (multiply_and_sum, dm_update_mode_amp*, dev_d/power_d/fft_total_d) and the hand-written _recon_dm_batch. - _update_psi -> recon_gpu_launch(recon_ml_grad_trans_gpu_single). - _gather_obj/_gather_prb switch to the ML_grad grad_upd path (additive obj += obj_update/prb_norm), since update_psi_gpu sets grad_upd=True for alg=='ML_grad' (DM used the replace form). Keeps the streaming divide-by-zero guard on the object gather. The object/probe accumulate + gather stay as local single-rank ports (no MPI); only the streaming wrapper lives here. Net −128 lines (reimplemented math gone). pixi.lock: repin ptycho master 488c476 -> 50c11f9 (picks up #87). Validation: scripts/offline_engine_test.py on the dumped scan-411993 inputs reconstructs the clean striped specimen matching the offline ePIE reference (vs the old DM blob); object stays finite across 200 iters; tests/ green (2 pre-existing tiled smoke failures aside). Refs #65.

Stop duplicating the object/probe update math in holoptycho. The streaming engine now drives ptycho master's verbatim accumulate_obj_gpu(+single), gather_obj_gpu, accumulate_prb_gpu(+single), gather_prb_gpu by binding them onto the instance (alongside the already-inherited ML_grad algorithm) — the paraphrased _accumulate_*/_gather_* methods are deleted. The inherited HXN accumulate_*_gpu_single pass a per-batch product_d slice, so gpu_setup binds kernel_accumulate_obj/prb_mode to the *_local kernels (NSLS2/ptycho#88) whose product_d indexing matches; master's global-z kernels stay for master's own recon. Dedup tracked in NSLS2/ptycho#89 (TODO in code). Single-GPU flags (comm=None, use_NCCL/use_CUDA_MPI=False) make the inherited gather take the no-MPI path; grad_upd=True selects the ML_grad additive update. pixi.lock: repin ptycho master -> 9d6d1fb (#88). Offline engine test on the dumped scan-411993 inputs reconstructs the clean striped specimen — now via ptycho's verbatim math, no holoptycho duplication. Refs #65.

init_product_range seeded product_d = prb*obj for newly-activated streaming points — it was added (2 days ago) to prop up the DM dual state, which is a running variable DM accumulates across iterations. ML_grad has no such dual: recon_ml_grad_trans_gpu_single recomputes prb_obj fresh and overwrites its product_d slice every batch (writing v before any read), so newly-activated points get a correct product_d on their first iteration with no seeding. holoscan-framework (HXN) has no equivalent — its one_iter/update_psi only seeds product_d at it==0, never for mid-stream new points, relying on the algorithm to overwrite. Removing this matches that behavior and drops dead DM-era code. Verified no-op for ML_grad; offline reconstruction unchanged. Refs #65.

…und) The iterative object's uncovered pixels stay at the uniform seed value (0.99*exp(-0.1j)); the dashboard colormapped them as if they were data (yellow amplitude / blue-green phase). Mark them NaN instead so the dashboard renders them transparent (its array renderer already maps NaN -> alpha 0). snapshot() now NaNs object pixels with no probe coverage (prb_norm_l == alpha; accumulate_obj_gpu seeds prb_norm to alpha then adds |prb|^2 only where the probe overlapped). Done on a copy so the stored mmap_obj and any saved reconstruction keep their finite seed background — the NaN is scoped to the live dashboard path only, and _update_mmap stays byte-identical to HXN. tiled_writer.write_live's finite-guard previously skipped any frame with a single NaN, which would skip every frame once the background is NaN. It now allows object background NaN while still skipping genuine divergence: a non-finite probe (always fully covered), any Inf in the object, or an object with no finite pixels at all. Verified on scan-411993 inputs: ~79% NaN background, covered region finite with no Inf, guard does not skip the frame. Refs #65.

…otation) holoscan-framework's _preprocess_diffraction hardcoded np.rot90(amplitude, axes=(2,1)) (== rot90_cw) on every diffraction frame before fftshift. The live pipeline split orientation into configurable knobs (detector_orientation, dp_orient, dp_orient_iterative), and the iterative engine's DP copy ended up with no rotation (dp_orient_iterative defaulted to None / disabled). The amplitude still reconstructed (positions consistent), but the object PHASE came out wrong. Set the --dp-orient-iterative default to rot90_cw so the iterative engine's DP is pinned to holoscan-framework's orientation. reorient_d4 applies it as an absolute D4 (independent of the shared dp_orient), so it is correct for both replay (detector_orientation=identity) and live. Flows to the replay via build_full_config (add_reconstruction_arguments) and to the CLI/live path the same way. Pass --dp-orient-iterative identity to opt out and follow the shared dp_orient + autodetect. Validated on scan 411993: phase corrects with rot90_cw. Refs #65.

test_iterative_flags_default_unset asserted dp_orient_iterative defaults to None; the default is now 'rot90_cw' (restores holoscan-framework's engine-DP rotation, fixing the reconstructed phase). The direction-override defaults are unchanged (still None). The emission-guard test still passes (the `if args.dp_orient_iterative is not None` guard is retained). Refs #65.

AGENTS.md and README documented dp_orient_iterative as unset/disabled by default with "no engine rotation is needed — leave unset". That conclusion was wrong: the missing rotation only surfaced once the ML_grad engine produced a clean phase (DM was too blobby to reveal it). The default is now rot90_cw, restoring holoscan-framework's hardcoded np.rot90(axes=(2,1)) on the engine DP. Updated the config table, the orientation list, the engine-conventions section, and the README flag description. Refs #65.

Garrett Bischof (gwbischof) added 6 commits June 13, 2026 13:20

offline_engine_test: add --from-cache mode for engine validation from…

27e7e4b

… the fetch cache

Garrett Bischof (gwbischof) changed the title ~~Streaming engine: drive ptycho's ML_grad instead of the reimplemented DM (#65)~~ Iterative streaming: drive ptycho's ML_grad, match holoscan-framework (#65) Jun 15, 2026

Garrett Bischof (gwbischof) added 3 commits June 15, 2026 00:25

Garrett Bischof (gwbischof) merged commit 21a61d3 into main Jun 15, 2026
5 checks passed

Garrett Bischof (gwbischof) deleted the feat/streaming-ml-grad branch June 15, 2026 04:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iterative streaming: drive ptycho's ML_grad, match holoscan-framework (#65)#66

Iterative streaming: drive ptycho's ML_grad, match holoscan-framework (#65)#66
Garrett Bischof (gwbischof) merged 9 commits into
mainfrom
feat/streaming-ml-grad

Garrett Bischof (gwbischof) commented Jun 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Garrett Bischof (gwbischof) commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Changes

Method-by-method review vs holoscan-framework

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Garrett Bischof (gwbischof) commented Jun 15, 2026 •

edited

Loading