Skip to content

perf(geometry): opt-in fast UnstructuredGrid surface (vendored OpenMP extract_surface)#82

Open
akaszynski wants to merge 1 commit into
feat/cutter-threadingfrom
feat/fast-surface-port
Open

perf(geometry): opt-in fast UnstructuredGrid surface (vendored OpenMP extract_surface)#82
akaszynski wants to merge 1 commit into
feat/cutter-threadingfrom
feat/fast-surface-port

Conversation

@akaszynski

Copy link
Copy Markdown
Member

Summary

Adds an opt-in fast boundary-surface path to vtkDataSetSurfaceFilter for unstructured grids, by vendoring pyvista-algorithms' extract_surface OpenMP kernel (MIT) and wiring a thin VTK adapter into UnstructuredGridExecute.

Stacked on #81 (cutter+contour EnableFast). Base is feat/cutter-threading; review/merge #81 first. The net-new diff here is the FiltersGeometry surface path + the points-relaxed comparator.

What it does

  • Filters/Geometry/pvaExtractSurface.h — vendored MIT kernel (namespace fse), self-contained (<std> + <omp.h>), excluded from the unity build.
  • Filters/Geometry/fvtkFastSurface.{h,cxx} — VTK adapter: validates a concrete vtkUnstructuredGrid of supported linear 3D cells (tetra/hex/voxel/wedge/pyramid), float/double points, <2³¹ points; zero-copy int32 connectivity (fvtk's width-relaxed default); calls fse::extract_surface + compact_points; builds output points/polys, copies point & cell data, optional OriginalPointIds/CellIds.
  • Hook in vtkDataSetSurfaceFilter::UnstructuredGridExecute, gated on fvtk::FastModeEnabled(). Default off → byte-exact. When the grid isn't eligible, returns false and the standard path runs.

Why OpenMP directly (not vtkSMPTools)

The kernel is called directly with n_threads=0 (→ OMP_NUM_THREADS). Routing it through vtkSMPTools would add dispatch overhead and mutate the process-global LocalScope singleton, oversubscribing against the kernel's own OpenMP region. The TU is gated on FVTK_HAVE_OPENMP; without OpenMP the adapter compiles to a stub.

Correctness / gating

Output is order-relaxed AND point-order-relaxed (thread/hash-dependent cell + surface-point emission order). The bit-exact comparator gains point-order canonicalization (compare.py: _point_canonicalization/_remap_conn, relax_points), threaded through run_ops.py's points_relaxed manifest flag. New op_datasetsurface_fast (order+points relaxed) covers the path; a fast_mode() context manager restores FVTK_FAST so opt-in can't leak into byte-exact ops sharing the run_ops process.

Validation

  • Built cp312-abi3 manylinux_2_28 wheel on the executor; confirmed libgomp is NEEDED by vtkFiltersGeometry.abi3.so (OpenMP genuinely linked, not the stub).
  • Bit-exact gate: 288 passed — the 4 new datasetsurface_fast cases match stock under the points-relaxed gate; the 4 standard datasetsurface_ugrid cases remain byte-exact.

🤖 Generated with Claude Code

…nMP kernel

Vendor pyvista-algorithms' extract_surface kernel (MIT) as
Filters/Geometry/pvaExtractSurface.h and add a thin VTK adapter
(fvtkFastSurface.{h,cxx}) wired into vtkDataSetSurfaceFilter's
UnstructuredGridExecute. The fast path is OPT-IN: it activates only when
fvtk::FastModeEnabled() (env FVTK_FAST / fvtk.EnableFast()) and the grid is a
concrete UnstructuredGrid of supported linear 3D cells (tetra/hex/voxel/wedge/
pyramid), float/double points, <2^31 points; otherwise FastUnstructuredSurface
returns false and the standard path runs (default byte-exact).

The kernel uses OpenMP directly (n_threads=0 -> OMP_NUM_THREADS), avoiding
vtkSMPTools dispatch + LocalScope oversubscription. The TU is excluded from the
unity build and gated on FVTK_HAVE_OPENMP; without OpenMP the adapter is a stub.

Output is order-relaxed AND point-order-relaxed (thread/hash-dependent cell and
surface-point emission order). Extend the bit-exact comparator with point-order
canonicalization (compare.py: _point_canonicalization/_remap_conn, relax_points)
threaded through run_ops.py's points_relaxed manifest flag, and add
op_datasetsurface_fast (order_relaxed + points_relaxed) covering the new path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@akaszynski akaszynski force-pushed the feat/fast-surface-port branch from 486cb74 to e66d79b Compare June 20, 2026 08:42
@akaszynski

Copy link
Copy Markdown
Member Author

Force-pushed a critical correctness fix (486cb74e66d79b).

While validating the follow-up clean port I discovered the fast path here was silently falling back to the standard path for the common homogeneous-grid case — so the points-relaxed tests were passing on the byte-exact fallback, not on the kernel.

Root cause: vtkUnstructuredGrid::GetCellTypesArray() returns vtkUnsignedCharArray::FastDownCast(this->Types), which is null for homogeneous grids whose types are stored as an implicit vtkConstantArray (e.g. anything built via SetCells(int type, cells) — all-tet/all-hex meshes). The adapter treated a null types array as "bail", so it never engaged.

Fix: acquire per-cell types robustly — zero-copy AOS pointer when available, else a per-cell GetCellType(i) copy. Verified the kernel now genuinely engages (output point order is reordered vs the standard path, same point set) and still matches stock under the points-relaxed gate (292/292 bit-exact).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant