perf: opt-in (EnableFast) order-relaxed threading — cutter + contour#81
Open
akaszynski wants to merge 2 commits into
Open
perf: opt-in (EnableFast) order-relaxed threading — cutter + contour#81akaszynski wants to merge 2 commits into
akaszynski wants to merge 2 commits into
Conversation
f07f271 to
e2551f3
Compare
…Cutter Adds an OPT-IN non-exact fast mode. By default fvtk stays a byte-exact drop-in: the linear-grid plane cutter runs serial and matches stock VTK exactly. Calling fvtk.EnableFast() (or setting FVTK_FAST=1) opts in to the multithreaded fast path, whose output is order-relaxed (same cells/points, cell ORDER depends on thread scheduling). Mechanism: - fvtk::FastModeEnabled() / fvtk::RunFastFilterParallel() (Common/Core): RunFastFilterParallel threads via the re-entrancy-guarded RunSafeFilter- Parallel ONLY when FastModeEnabled() (env FVTK_FAST, read live so it is runtime-toggleable); otherwise runs the body serially (byte-exact). - vtk3DLinearGridPlaneCutter: its EXECUTE_SMPFOR macro now uses RunFastFilterParallel. Profiling (py-spy --native) put 35% of vtkCutter self-time in this filter's ExtractEdges; under fvtk's Sequential SMP backend it ran serial. - fvtk.EnableFast()/DisableFast()/IsFastEnabled() Python API (package init) set/clear FVTK_FAST. Why opt-in: the threaded triangle emission composites per-thread buffers in thread order, so output CELL ORDER differs from the sequential reference AND varies run-to-run. Points, interpolated point scalars, and the constant plane normal are thread-INVARIANT. Default-off keeps byte-exactness for users who depend on it; EnableFast() trades cell-order reproducibility for the threaded speedup. Test infra (order-relaxed mesh-equality gate): - compare.py: order-relaxed mode (points + point-data strict; cells compared as a multiset keyed by (group/celltype, connectivity-tuple) carrying their cell-data; width-relaxed for int cell-data). - run_ops.py: propagate per-op order_relaxed flag into the manifest. - ops.py: op_cutter_linear -- large hex-UG plane cut (triangles ON) that sets FVTK_FAST=1 (stock ignores it) and drives the threaded path at batch- splitting sizes; order_relaxed=True, f32/f64, sizes 30/40. - test_smp_determinism.py: cutter_linear in THREADED_OPS (thread-count- invariance gate, order-relaxed). - test_bitexact.py: defensive failure formatting for relaxed mode. Contour deferred: vtkContour3DLinearGrid normal averaging is reduction- order-dependent, so threading perturbs normal VALUES (not just order) -- not order-relaxable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
e2551f3 to
11111a1
Compare
…rid (normals-off) Extends the EnableFast() opt-in fast lane to the linear-grid isocontour. Threads its EXECUTE_SMPFOR sites via RunFastFilterParallel, but ONLY when ComputeNormals is OFF: the call-site gate ORs in GetComputeNormals(), so a contour that computes normals stays serial / byte-exact. This is required because the surface-normal averaging sums cell-normals at shared points in cell order, so threaded (reordered) cells would perturb normal VALUES, not just order -- not order-relaxable. With normals off the merge path produces thread-invariant points + point scalars; only triangle emission order varies (order-relaxed, like the cutter). - op_contour_linear: large hex-UG isocontour, ComputeNormals OFF, sets FVTK_FAST=1; order_relaxed=True, f32/f64, sizes 30/40. - contour_linear added to THREADED_OPS (thread-count-invariance gate). - __all__ exports EnableFast/DisableFast/IsFastEnabled. Profiling put 44% of vtkContourFilter self-time in this filter's ExtractEdges. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Member
Author
|
Added contour ( Gated on Validated in-container:
PR now covers cutter + contour under one opt-in contract. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Opt-in non-exact fast mode — default stays byte-exact
By default fvtk remains a byte-exact drop-in: this cutter runs serial and matches stock VTK exactly. Users opt in to the threaded fast path with
fvtk.EnableFast()(orFVTK_FAST=1). This addresses the contract concern on the earlier revision — non-exactness is now strictly opt-in, not default.Mechanism
fvtk::FastModeEnabled()/fvtk::RunFastFilterParallel()(Common/Core): the latter threads via the re-entrancy-guardedRunSafeFilterParallelonly when fast mode is on (envFVTK_FAST, read live → runtime-toggleable); otherwise runs serially (byte-exact).vtk3DLinearGridPlaneCutter'sEXECUTE_SMPFORmacro now usesRunFastFilterParallel.fvtk.EnableFast()/DisableFast()/IsFastEnabled()Python API set/clearFVTK_FAST.Why opt-in (the non-exactness)
Profiling (py-spy --native) put 35% of
vtkCutterself-time in this filter'sExtractEdges. Threading it reorders the per-thread triangle buffers, so output cell order differs from stock and varies run-to-run. Points, interpolated point scalars, and the constant plane normal are thread-invariant. So fast mode trades cell-order reproducibility for the speedup — your call, per call site.Validation (in-container)
op_cutter_linear(f32/f64 × sizes 30/40) setsFVTK_FAST=1; fvtk threads, stock stays serial; compared order-relaxed (points/point-data strict; same triangle multiset carrying cell-data). Thread-count invariance (1/4/8) holds.Test infra
compare.py: order-relaxed mesh-equality mode (per-oporder_relaxedflag via the manifest).ops.py:op_cutter_linear,order_relaxed=True.test_smp_determinism.py:cutter_linearinTHREADED_OPS.Deferred
Contour (
vtkContour3DLinearGrid): surface-normal averaging is reduction-order-dependent → threading perturbs normal values, not just order → not order-relaxable even with EnableFast. Left serial.🤖 Generated with Claude Code