Skip to content

Iterative streaming: drive ptycho's ML_grad, match holoscan-framework (#65)#66

Merged
Garrett Bischof (gwbischof) merged 9 commits into
mainfrom
feat/streaming-ml-grad
Jun 15, 2026
Merged

Iterative streaming: drive ptycho's ML_grad, match holoscan-framework (#65)#66
Garrett Bischof (gwbischof) merged 9 commits into
mainfrom
feat/streaming-ml-grad

Conversation

@gwbischof

@gwbischof Garrett Bischof (gwbischof) commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

What

Fixes the iterative streaming reconstruction (issue #65). The engine was a hand-reimplementation of the per-batch recon math that only did DM — the wrong algorithm — which is why the live recon was broken. This makes StreamingPtychoRecon a thin streaming wrapper that drives ptycho master's verbatim ML_grad code, and brings the streaming path back in line with the known-good holoscan-framework beamline branch.

Depends on NSLS2/ptycho#87 (ML_grad algorithm, additive) + NSLS2/ptycho#88 (accumulate_*_mode_local kernels), both merged. pixi.lock repins ptycho master.

Changes

  1. Drive ptycho's ML_grad, delete the DM reimplementation. _update_psirecon_gpu_launch(recon_ml_grad_trans_gpu_single); the hand-written DM batch + DM kernels/buffers are gone.
  2. Inherit all the math from ptycho — zero duplication. recon_ml_grad, recon_gpu_launch, calc_prb_obj, forward/back_prop, and accumulate_obj/prb + gather_obj/prb are bound from ptycho_trans (verbatim, NSLS2/ptycho#87). The paraphrased _accumulate_*/_gather_* are deleted.
  3. Correct the kernel pairing. ptycho master's accumulate_*_mode kernels index product_d globally (full array); HXN's accumulate_*_gpu_single pass a per-batch slice and need local-z indexing. gpu_setup binds the *_local kernels (#88) so the method+kernel pair is faithful to HXN. (Dedup tracked in NSLS2/ptycho#89.)
  4. Remove init_product_range. Vestigial DM dual-state seeding; ML_grad recomputes product_d every batch, so it's a no-op — and holoscan-framework has no equivalent.
  5. Dashboard: transparent no-data background. No-data object pixels (no probe coverage) are NaN'd in the live snapshot so the dashboard renders them transparent instead of colormapping the seed value. Scoped to the live copy; saved reconstructions keep their finite background. write_live's guard updated to allow background NaN while still catching divergence.

Method-by-method review vs holoscan-framework

Every streaming-state method was compared against the HXN branch holoscan-framework ran. Result: all functionally equivalent or a required-for-main deviation — the only genuine divergences were the kernel mispairing and init_product_range, both fixed above. Required deviations: single-GPU (no MPI/NCCL), in-process numpy snapshots, probe-from-prb_path-file, clear_region_enabled gate.

Validation

  • ✅ Offline engine test on dumped scan-411993 inputs: reconstructs the clean striped specimen (ePIE-matching), all-finite.
  • Live replay (scan 411993): object resolves correctly, matches holoscan-framework; dashboard background transparent.
  • tests/ green (2 pre-existing tiled-404 smoke failures aside).

…lidation

Standalone ML (maximum-likelihood gradient) and ePIE on dumped/cached
inputs, used to confirm the data/geometry are correct independent of the
engine. Dev tooling for validating the #65 library-engine migration.
)

The streaming engine was a hand-reimplementation of the per-batch recon math
that only implemented DM — the wrong algorithm. The HXN beamline runs ML_grad,
which is why pipeline-DM matched the offline-DM (both blobby) while an offline
ePIE reconstructed the same inputs cleanly.

Fix: StreamingPtychoRecon now drives ptycho master's real ML_grad methods
(added additively in NSLS2/ptycho#87) instead of reimplementing them:

  - gpu_setup binds recon_gpu_launch / recon_ml_grad_trans_gpu_single /
    calc_prb_obj_gpu / forward_prop_gpu / back_prop_gpu onto the instance via
    getattr(ptycho_trans, name).__get__(self) — verbatim beamline code, no
    paraphrase. The import is deferred to gpu_setup so the pure-Python unit
    tests still construct a recon without a GPU / ptycho / matplotlib.
  - Binds the ML_grad kernels (get_amp, ml_calc_grad, multiply_with_mode) and
    allocates tmp1_d/tmp2_d; drops the DM-only kernels/buffers
    (multiply_and_sum, dm_update_mode_amp*, dev_d/power_d/fft_total_d) and the
    hand-written _recon_dm_batch.
  - _update_psi -> recon_gpu_launch(recon_ml_grad_trans_gpu_single).
  - _gather_obj/_gather_prb switch to the ML_grad grad_upd path (additive
    obj += obj_update/prb_norm), since update_psi_gpu sets grad_upd=True for
    alg=='ML_grad' (DM used the replace form). Keeps the streaming
    divide-by-zero guard on the object gather.

The object/probe accumulate + gather stay as local single-rank ports (no MPI);
only the streaming wrapper lives here. Net −128 lines (reimplemented math gone).

pixi.lock: repin ptycho master 488c476 -> 50c11f9 (picks up #87).

Validation: scripts/offline_engine_test.py on the dumped scan-411993 inputs
reconstructs the clean striped specimen matching the offline ePIE reference
(vs the old DM blob); object stays finite across 200 iters; tests/ green
(2 pre-existing tiled smoke failures aside).

Refs #65.
Stop duplicating the object/probe update math in holoptycho. The streaming
engine now drives ptycho master's verbatim accumulate_obj_gpu(+single),
gather_obj_gpu, accumulate_prb_gpu(+single), gather_prb_gpu by binding them onto
the instance (alongside the already-inherited ML_grad algorithm) — the
paraphrased _accumulate_*/_gather_* methods are deleted.

The inherited HXN accumulate_*_gpu_single pass a per-batch product_d slice, so
gpu_setup binds kernel_accumulate_obj/prb_mode to the *_local kernels
(NSLS2/ptycho#88) whose product_d indexing matches; master's global-z kernels
stay for master's own recon. Dedup tracked in NSLS2/ptycho#89 (TODO in code).

Single-GPU flags (comm=None, use_NCCL/use_CUDA_MPI=False) make the inherited
gather take the no-MPI path; grad_upd=True selects the ML_grad additive update.

pixi.lock: repin ptycho master -> 9d6d1fb (#88).

Offline engine test on the dumped scan-411993 inputs reconstructs the clean
striped specimen — now via ptycho's verbatim math, no holoptycho duplication.

Refs #65.
init_product_range seeded product_d = prb*obj for newly-activated streaming
points — it was added (2 days ago) to prop up the DM dual state, which is a
running variable DM accumulates across iterations. ML_grad has no such dual:
recon_ml_grad_trans_gpu_single recomputes prb_obj fresh and overwrites its
product_d slice every batch (writing v before any read), so newly-activated
points get a correct product_d on their first iteration with no seeding.

holoscan-framework (HXN) has no equivalent — its one_iter/update_psi only
seeds product_d at it==0, never for mid-stream new points, relying on the
algorithm to overwrite. Removing this matches that behavior and drops dead
DM-era code.

Verified no-op for ML_grad; offline reconstruction unchanged.

Refs #65.
…und)

The iterative object's uncovered pixels stay at the uniform seed value
(0.99*exp(-0.1j)); the dashboard colormapped them as if they were data
(yellow amplitude / blue-green phase). Mark them NaN instead so the
dashboard renders them transparent (its array renderer already maps NaN ->
alpha 0).

snapshot() now NaNs object pixels with no probe coverage (prb_norm_l == alpha;
accumulate_obj_gpu seeds prb_norm to alpha then adds |prb|^2 only where the
probe overlapped). Done on a copy so the stored mmap_obj and any saved
reconstruction keep their finite seed background — the NaN is scoped to the
live dashboard path only, and _update_mmap stays byte-identical to HXN.

tiled_writer.write_live's finite-guard previously skipped any frame with a
single NaN, which would skip every frame once the background is NaN. It now
allows object background NaN while still skipping genuine divergence: a
non-finite probe (always fully covered), any Inf in the object, or an object
with no finite pixels at all.

Verified on scan-411993 inputs: ~79% NaN background, covered region finite
with no Inf, guard does not skip the frame.

Refs #65.
@gwbischof Garrett Bischof (gwbischof) changed the title Streaming engine: drive ptycho's ML_grad instead of the reimplemented DM (#65) Iterative streaming: drive ptycho's ML_grad, match holoscan-framework (#65) Jun 15, 2026
…otation)

holoscan-framework's _preprocess_diffraction hardcoded
np.rot90(amplitude, axes=(2,1)) (== rot90_cw) on every diffraction frame before
fftshift. The live pipeline split orientation into configurable knobs
(detector_orientation, dp_orient, dp_orient_iterative), and the iterative
engine's DP copy ended up with no rotation (dp_orient_iterative defaulted to
None / disabled). The amplitude still reconstructed (positions consistent), but
the object PHASE came out wrong.

Set the --dp-orient-iterative default to rot90_cw so the iterative engine's DP
is pinned to holoscan-framework's orientation. reorient_d4 applies it as an
absolute D4 (independent of the shared dp_orient), so it is correct for both
replay (detector_orientation=identity) and live. Flows to the replay via
build_full_config (add_reconstruction_arguments) and to the CLI/live path the
same way. Pass --dp-orient-iterative identity to opt out and follow the shared
dp_orient + autodetect.

Validated on scan 411993: phase corrects with rot90_cw.

Refs #65.
test_iterative_flags_default_unset asserted dp_orient_iterative defaults to
None; the default is now 'rot90_cw' (restores holoscan-framework's engine-DP
rotation, fixing the reconstructed phase). The direction-override defaults are
unchanged (still None). The emission-guard test still passes (the
`if args.dp_orient_iterative is not None` guard is retained).

Refs #65.
AGENTS.md and README documented dp_orient_iterative as unset/disabled by
default with "no engine rotation is needed — leave unset". That conclusion was
wrong: the missing rotation only surfaced once the ML_grad engine produced a
clean phase (DM was too blobby to reveal it). The default is now rot90_cw,
restoring holoscan-framework's hardcoded np.rot90(axes=(2,1)) on the engine DP.
Updated the config table, the orientation list, the engine-conventions section,
and the README flag description.

Refs #65.
@gwbischof Garrett Bischof (gwbischof) merged commit 21a61d3 into main Jun 15, 2026
5 checks passed
@gwbischof Garrett Bischof (gwbischof) deleted the feat/streaming-ml-grad branch June 15, 2026 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant