Skip to content

perf: opt-in (EnableFast) order-relaxed threading — cutter + contour#81

Open
akaszynski wants to merge 2 commits into
mainfrom
feat/cutter-threading
Open

perf: opt-in (EnableFast) order-relaxed threading — cutter + contour#81
akaszynski wants to merge 2 commits into
mainfrom
feat/cutter-threading

Conversation

@akaszynski

@akaszynski akaszynski commented Jun 20, 2026

Copy link
Copy Markdown
Member

Opt-in non-exact fast mode — default stays byte-exact

By default fvtk remains a byte-exact drop-in: this cutter runs serial and matches stock VTK exactly. Users opt in to the threaded fast path with fvtk.EnableFast() (or FVTK_FAST=1). This addresses the contract concern on the earlier revision — non-exactness is now strictly opt-in, not default.

import fvtk
fvtk.EnableFast()      # opt in to non-exact threaded fast paths
# ... vtkCutter on a linear UG now threads (order-relaxed) ...
fvtk.DisableFast()     # back to byte-exact (default)

Mechanism

  • fvtk::FastModeEnabled() / fvtk::RunFastFilterParallel() (Common/Core): the latter threads via the re-entrancy-guarded RunSafeFilterParallel only when fast mode is on (env FVTK_FAST, read live → runtime-toggleable); otherwise runs serially (byte-exact).
  • vtk3DLinearGridPlaneCutter's EXECUTE_SMPFOR macro now uses RunFastFilterParallel.
  • fvtk.EnableFast() / DisableFast() / IsFastEnabled() Python API set/clear FVTK_FAST.

Why opt-in (the non-exactness)

Profiling (py-spy --native) put 35% of vtkCutter self-time in this filter's ExtractEdges. Threading it reorders the per-thread triangle buffers, so output cell order differs from stock and varies run-to-run. Points, interpolated point scalars, and the constant plane normal are thread-invariant. So fast mode trades cell-order reproducibility for the speedup — your call, per call site.

Validation (in-container)

  • Default (no EnableFast): byte-exact — cutter runs serial, identical to stock.
  • EnableFast: order-relaxed gate passesop_cutter_linear (f32/f64 × sizes 30/40) sets FVTK_FAST=1; fvtk threads, stock stays serial; compared order-relaxed (points/point-data strict; same triangle multiset carrying cell-data). Thread-count invariance (1/4/8) holds.
  • strict compare (seq vs threaded) fails — confirms threading actually engages.

Test infra

  • compare.py: order-relaxed mesh-equality mode (per-op order_relaxed flag via the manifest).
  • ops.py: op_cutter_linear, order_relaxed=True.
  • test_smp_determinism.py: cutter_linear in THREADED_OPS.

Deferred

Contour (vtkContour3DLinearGrid): surface-normal averaging is reduction-order-dependent → threading perturbs normal values, not just order → not order-relaxable even with EnableFast. Left serial.

🤖 Generated with Claude Code

@akaszynski akaszynski force-pushed the feat/cutter-threading branch from f07f271 to e2551f3 Compare June 20, 2026 03:34
…Cutter

Adds an OPT-IN non-exact fast mode. By default fvtk stays a byte-exact
drop-in: the linear-grid plane cutter runs serial and matches stock VTK
exactly. Calling fvtk.EnableFast() (or setting FVTK_FAST=1) opts in to the
multithreaded fast path, whose output is order-relaxed (same cells/points,
cell ORDER depends on thread scheduling).

Mechanism:
- fvtk::FastModeEnabled() / fvtk::RunFastFilterParallel() (Common/Core):
  RunFastFilterParallel threads via the re-entrancy-guarded RunSafeFilter-
  Parallel ONLY when FastModeEnabled() (env FVTK_FAST, read live so it is
  runtime-toggleable); otherwise runs the body serially (byte-exact).
- vtk3DLinearGridPlaneCutter: its EXECUTE_SMPFOR macro now uses
  RunFastFilterParallel. Profiling (py-spy --native) put 35% of vtkCutter
  self-time in this filter's ExtractEdges; under fvtk's Sequential SMP
  backend it ran serial.
- fvtk.EnableFast()/DisableFast()/IsFastEnabled() Python API (package init)
  set/clear FVTK_FAST.

Why opt-in: the threaded triangle emission composites per-thread buffers in
thread order, so output CELL ORDER differs from the sequential reference AND
varies run-to-run. Points, interpolated point scalars, and the constant
plane normal are thread-INVARIANT. Default-off keeps byte-exactness for
users who depend on it; EnableFast() trades cell-order reproducibility for
the threaded speedup.

Test infra (order-relaxed mesh-equality gate):
- compare.py: order-relaxed mode (points + point-data strict; cells compared
  as a multiset keyed by (group/celltype, connectivity-tuple) carrying their
  cell-data; width-relaxed for int cell-data).
- run_ops.py: propagate per-op order_relaxed flag into the manifest.
- ops.py: op_cutter_linear -- large hex-UG plane cut (triangles ON) that sets
  FVTK_FAST=1 (stock ignores it) and drives the threaded path at batch-
  splitting sizes; order_relaxed=True, f32/f64, sizes 30/40.
- test_smp_determinism.py: cutter_linear in THREADED_OPS (thread-count-
  invariance gate, order-relaxed).
- test_bitexact.py: defensive failure formatting for relaxed mode.

Contour deferred: vtkContour3DLinearGrid normal averaging is reduction-
order-dependent, so threading perturbs normal VALUES (not just order) --
not order-relaxable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@akaszynski akaszynski changed the title perf(cutter): order-relaxed default-on threading for vtk3DLinearGridPlaneCutter perf(cutter): opt-in (EnableFast) order-relaxed threading for vtk3DLinearGridPlaneCutter Jun 20, 2026
@akaszynski akaszynski force-pushed the feat/cutter-threading branch from e2551f3 to 11111a1 Compare June 20, 2026 04:13
…rid (normals-off)

Extends the EnableFast() opt-in fast lane to the linear-grid isocontour.
Threads its EXECUTE_SMPFOR sites via RunFastFilterParallel, but ONLY when
ComputeNormals is OFF: the call-site gate ORs in GetComputeNormals(), so a
contour that computes normals stays serial / byte-exact. This is required
because the surface-normal averaging sums cell-normals at shared points in
cell order, so threaded (reordered) cells would perturb normal VALUES, not
just order -- not order-relaxable.

With normals off the merge path produces thread-invariant points + point
scalars; only triangle emission order varies (order-relaxed, like the cutter).

- op_contour_linear: large hex-UG isocontour, ComputeNormals OFF, sets
  FVTK_FAST=1; order_relaxed=True, f32/f64, sizes 30/40.
- contour_linear added to THREADED_OPS (thread-count-invariance gate).
- __all__ exports EnableFast/DisableFast/IsFastEnabled.

Profiling put 44% of vtkContourFilter self-time in this filter's ExtractEdges.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@akaszynski akaszynski changed the title perf(cutter): opt-in (EnableFast) order-relaxed threading for vtk3DLinearGridPlaneCutter perf: opt-in (EnableFast) order-relaxed threading — cutter + contour Jun 20, 2026
@akaszynski

Copy link
Copy Markdown
Member Author

Added contour (vtkContour3DLinearGrid, 44% of vtkContourFilter self-time) to the same EnableFast lane.

Gated on !ComputeNormals: a contour that computes normals stays serial / byte-exact (its normal averaging sums cell-normals at shared points in cell order, so reordered threaded cells would perturb normal values, not just order). Normals-off threads order-relaxed like the cutter.

Validated in-container:

  • bitexact: 284 passed (+6 contour_linear: 4 order-relaxed + 2 thread-count-invariance).
  • Gate proof: contour with normals + EnableFast → deterministic run-to-run + byte-exact; without normals → threads (conn varies run-to-run).

PR now covers cutter + contour under one opt-in contract.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant