Iterative streaming: drive ptycho's ML_grad, match holoscan-framework (#65)#66
Merged
Merged
Conversation
…lidation Standalone ML (maximum-likelihood gradient) and ePIE on dumped/cached inputs, used to confirm the data/geometry are correct independent of the engine. Dev tooling for validating the #65 library-engine migration.
) The streaming engine was a hand-reimplementation of the per-batch recon math that only implemented DM — the wrong algorithm. The HXN beamline runs ML_grad, which is why pipeline-DM matched the offline-DM (both blobby) while an offline ePIE reconstructed the same inputs cleanly. Fix: StreamingPtychoRecon now drives ptycho master's real ML_grad methods (added additively in NSLS2/ptycho#87) instead of reimplementing them: - gpu_setup binds recon_gpu_launch / recon_ml_grad_trans_gpu_single / calc_prb_obj_gpu / forward_prop_gpu / back_prop_gpu onto the instance via getattr(ptycho_trans, name).__get__(self) — verbatim beamline code, no paraphrase. The import is deferred to gpu_setup so the pure-Python unit tests still construct a recon without a GPU / ptycho / matplotlib. - Binds the ML_grad kernels (get_amp, ml_calc_grad, multiply_with_mode) and allocates tmp1_d/tmp2_d; drops the DM-only kernels/buffers (multiply_and_sum, dm_update_mode_amp*, dev_d/power_d/fft_total_d) and the hand-written _recon_dm_batch. - _update_psi -> recon_gpu_launch(recon_ml_grad_trans_gpu_single). - _gather_obj/_gather_prb switch to the ML_grad grad_upd path (additive obj += obj_update/prb_norm), since update_psi_gpu sets grad_upd=True for alg=='ML_grad' (DM used the replace form). Keeps the streaming divide-by-zero guard on the object gather. The object/probe accumulate + gather stay as local single-rank ports (no MPI); only the streaming wrapper lives here. Net −128 lines (reimplemented math gone). pixi.lock: repin ptycho master 488c476 -> 50c11f9 (picks up #87). Validation: scripts/offline_engine_test.py on the dumped scan-411993 inputs reconstructs the clean striped specimen matching the offline ePIE reference (vs the old DM blob); object stays finite across 200 iters; tests/ green (2 pre-existing tiled smoke failures aside). Refs #65.
Stop duplicating the object/probe update math in holoptycho. The streaming engine now drives ptycho master's verbatim accumulate_obj_gpu(+single), gather_obj_gpu, accumulate_prb_gpu(+single), gather_prb_gpu by binding them onto the instance (alongside the already-inherited ML_grad algorithm) — the paraphrased _accumulate_*/_gather_* methods are deleted. The inherited HXN accumulate_*_gpu_single pass a per-batch product_d slice, so gpu_setup binds kernel_accumulate_obj/prb_mode to the *_local kernels (NSLS2/ptycho#88) whose product_d indexing matches; master's global-z kernels stay for master's own recon. Dedup tracked in NSLS2/ptycho#89 (TODO in code). Single-GPU flags (comm=None, use_NCCL/use_CUDA_MPI=False) make the inherited gather take the no-MPI path; grad_upd=True selects the ML_grad additive update. pixi.lock: repin ptycho master -> 9d6d1fb (#88). Offline engine test on the dumped scan-411993 inputs reconstructs the clean striped specimen — now via ptycho's verbatim math, no holoptycho duplication. Refs #65.
init_product_range seeded product_d = prb*obj for newly-activated streaming points — it was added (2 days ago) to prop up the DM dual state, which is a running variable DM accumulates across iterations. ML_grad has no such dual: recon_ml_grad_trans_gpu_single recomputes prb_obj fresh and overwrites its product_d slice every batch (writing v before any read), so newly-activated points get a correct product_d on their first iteration with no seeding. holoscan-framework (HXN) has no equivalent — its one_iter/update_psi only seeds product_d at it==0, never for mid-stream new points, relying on the algorithm to overwrite. Removing this matches that behavior and drops dead DM-era code. Verified no-op for ML_grad; offline reconstruction unchanged. Refs #65.
…und) The iterative object's uncovered pixels stay at the uniform seed value (0.99*exp(-0.1j)); the dashboard colormapped them as if they were data (yellow amplitude / blue-green phase). Mark them NaN instead so the dashboard renders them transparent (its array renderer already maps NaN -> alpha 0). snapshot() now NaNs object pixels with no probe coverage (prb_norm_l == alpha; accumulate_obj_gpu seeds prb_norm to alpha then adds |prb|^2 only where the probe overlapped). Done on a copy so the stored mmap_obj and any saved reconstruction keep their finite seed background — the NaN is scoped to the live dashboard path only, and _update_mmap stays byte-identical to HXN. tiled_writer.write_live's finite-guard previously skipped any frame with a single NaN, which would skip every frame once the background is NaN. It now allows object background NaN while still skipping genuine divergence: a non-finite probe (always fully covered), any Inf in the object, or an object with no finite pixels at all. Verified on scan-411993 inputs: ~79% NaN background, covered region finite with no Inf, guard does not skip the frame. Refs #65.
…otation) holoscan-framework's _preprocess_diffraction hardcoded np.rot90(amplitude, axes=(2,1)) (== rot90_cw) on every diffraction frame before fftshift. The live pipeline split orientation into configurable knobs (detector_orientation, dp_orient, dp_orient_iterative), and the iterative engine's DP copy ended up with no rotation (dp_orient_iterative defaulted to None / disabled). The amplitude still reconstructed (positions consistent), but the object PHASE came out wrong. Set the --dp-orient-iterative default to rot90_cw so the iterative engine's DP is pinned to holoscan-framework's orientation. reorient_d4 applies it as an absolute D4 (independent of the shared dp_orient), so it is correct for both replay (detector_orientation=identity) and live. Flows to the replay via build_full_config (add_reconstruction_arguments) and to the CLI/live path the same way. Pass --dp-orient-iterative identity to opt out and follow the shared dp_orient + autodetect. Validated on scan 411993: phase corrects with rot90_cw. Refs #65.
test_iterative_flags_default_unset asserted dp_orient_iterative defaults to None; the default is now 'rot90_cw' (restores holoscan-framework's engine-DP rotation, fixing the reconstructed phase). The direction-override defaults are unchanged (still None). The emission-guard test still passes (the `if args.dp_orient_iterative is not None` guard is retained). Refs #65.
AGENTS.md and README documented dp_orient_iterative as unset/disabled by default with "no engine rotation is needed — leave unset". That conclusion was wrong: the missing rotation only surfaced once the ML_grad engine produced a clean phase (DM was too blobby to reveal it). The default is now rot90_cw, restoring holoscan-framework's hardcoded np.rot90(axes=(2,1)) on the engine DP. Updated the config table, the orientation list, the engine-conventions section, and the README flag description. Refs #65.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Fixes the iterative streaming reconstruction (issue #65). The engine was a hand-reimplementation of the per-batch recon math that only did DM — the wrong algorithm — which is why the live recon was broken. This makes
StreamingPtychoRecona thin streaming wrapper that drives ptycho master's verbatim ML_grad code, and brings the streaming path back in line with the known-good holoscan-framework beamline branch.Depends on NSLS2/ptycho#87 (ML_grad algorithm, additive) + NSLS2/ptycho#88 (
accumulate_*_mode_localkernels), both merged.pixi.lockrepins ptycho master.Changes
_update_psi→recon_gpu_launch(recon_ml_grad_trans_gpu_single); the hand-written DM batch + DM kernels/buffers are gone.recon_ml_grad,recon_gpu_launch,calc_prb_obj,forward/back_prop, andaccumulate_obj/prb+gather_obj/prbare bound fromptycho_trans(verbatim, NSLS2/ptycho#87). The paraphrased_accumulate_*/_gather_*are deleted.accumulate_*_modekernels indexproduct_dglobally (full array); HXN'saccumulate_*_gpu_singlepass a per-batch slice and need local-z indexing.gpu_setupbinds the*_localkernels (#88) so the method+kernel pair is faithful to HXN. (Dedup tracked in NSLS2/ptycho#89.)init_product_range. Vestigial DM dual-state seeding; ML_grad recomputesproduct_devery batch, so it's a no-op — and holoscan-framework has no equivalent.write_live's guard updated to allow background NaN while still catching divergence.Method-by-method review vs holoscan-framework
Every streaming-state method was compared against the HXN branch holoscan-framework ran. Result: all functionally equivalent or a required-for-main deviation — the only genuine divergences were the kernel mispairing and
init_product_range, both fixed above. Required deviations: single-GPU (no MPI/NCCL), in-process numpy snapshots, probe-from-prb_path-file,clear_region_enabledgate.Validation
tests/green (2 pre-existing tiled-404 smoke failures aside).