Skip to content

Update: GPU knn BVH search for irregular grid#56

Merged
tan2 merged 20 commits intomasterfrom
update/GPU-knn-bvh-search
Apr 22, 2026
Merged

Update: GPU knn BVH search for irregular grid#56
tan2 merged 20 commits intomasterfrom
update/GPU-knn-bvh-search

Conversation

@chaseshyu
Copy link
Copy Markdown
Member

@chaseshyu chaseshyu commented Apr 15, 2026

This pull request introduces improvements to the KNN (k-nearest neighbors) search infrastructure, especially for GPU/OpenACC builds, and enhances the efficiency and flexibility of marker and node interpolation routines. The changes include adding the knn-bvh submodule for GPU-accelerated KNN, refactoring the KNN class API, updating the build system to support the new backend, and improving logging and batching for large-scale searches. These updates aim to improve performance, support larger datasets, and provide clearer diagnostics during execution.

KNN backend integration and build system updates:

  • Previous hash grid search for GPU knn search becomes invalid when handling different size elements in irregular grid. knn BVH search is using Bounding Volume Hierarchy, a tree-structure acceleration technique, to gather k-nearest neighbors efficiently. This change reduces 61% of remeshing time on GPU.
  • Tests: Regular gird | 200k elements | GPU
    This change
  Execute : 000:01:18.300342
  Initiate: 000:00:33.291851 (42.52%)
  Compute : 000:00:01.207063 ( 1.54%)/ 40 = 0.030177 s/step
  Remesh  : 000:00:35.669887 (45.56%)/ 1 = 35.67 s/remesh
  Output  : 000:00:08.131541 (10.39%)/ 5 = 1.63 s/output

Commit 5b41399

  Execute : 000:02:28.117907
  Initiate: 000:00:44.912479 (30.32%)
  Compute : 000:00:01.300568 ( 0.88%)/ 40 = 0.032514 s/step
  Remesh  : 000:01:32.118626 (62.19%)/ 1 = 92.12 s/remesh
  Output  : 000:00:09.786234 ( 6.61%)/ 5 = 1.96 s/output
  • Added the knn-bvh submodule and integrated its build process into the Makefile for OpenACC-enabled builds. The build now ensures the submodule is initialized and built as needed, and cleans up appropriately. (.gitmodules, Makefile, knn-bvh) [1] [2] [3] [4] [5]

KNN class and API refactoring:

  • Refactored the KNN class to support both CPU and GPU backends, added batching support, memory usage reporting, and a new search API that returns a pointer to the result buffer. The constructor now accepts a maximum capacity to minimize reallocations. (knn.hpp)

Performance and batching improvements for interpolation and marker routines:

  • Updated node and marker interpolation routines to use batched KNN queries, improving performance for large numbers of queries. The routines now print backend information, memory usage, and batch/block sizes for diagnostics. (brc-interpolation.cxx, markerset.cxx, nn-interpolation.cxx) [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

Auxiliary improvements and code cleanup:

  • Added a new method to Array2D for packing data into a GPU-friendly format (pack_to_xyz_float). (array2d.hpp)
  • Disabled the unused create_neighbor calls. [1] [2]

Documentation and CI:

  • Added a badge for the new nvc (NVIDIA HPC compiler) build workflow to the README.md. (README.md)
  • Added Docker build and push workflow with CUDA Dockerfile to stablize Github Action CI environment build.

These changes collectively modernize and optimize the KNN infrastructure, especially for GPU-accelerated workflows, and improve the maintainability and scalability of the codebase.

Close #9

@chaseshyu chaseshyu requested review from Copilot, echoi and tan2 April 15, 2026 13:30
@chaseshyu chaseshyu added enhancement New feature or request bugfixes Fix bugs labels Apr 15, 2026
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades the KNN infrastructure for OpenACC/GPU builds by switching the GPU backend to a BVH-based implementation (via the new knn-bvh submodule), and refactors interpolation/marker routines to use batched KNN queries with improved runtime diagnostics.

Changes:

  • Add and wire up the knn-bvh submodule + Makefile targets for OpenACC builds.
  • Refactor KNN API to support GPU BVH search, batching limits, and returning a results-buffer pointer.
  • Update interpolation and marker remapping to use batched GPU-direct KNN searches and improved logging/memory reporting.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
.gitmodules Adds knn-bvh submodule configuration.
Makefile Integrates knn-bvh include/lib, adds prepare step, and adds clean/build rules for the submodule.
README.md Adds CI badge for the new nvc build workflow.
array2d.hpp Adds helper to pack coordinates into float3 for GPU KNN.
brc-interpolation.cxx Switches node interpolation to batched KNN queries and new KNN API.
dynearthsol.cxx Disables create_neighbor() calls during init/restart.
knn.hpp Refactors KNN interface to return a neighbor*, adds backend/memory reporting helpers.
knn.cxx Replaces hash-grid GPU KNN with BVH-backed search; adds batch sizing and result-buffer handling.
markerset.cxx Switches marker remap/replenish KNN calls to new API and batching with diagnostics.
nn-interpolation.cxx Switches NN/ACM interpolation KNN calls to new API and batching with diagnostics.
parameters.hpp Adjusts neighbor type definition for ACC builds and includes BVH header under ACC.
remeshing.cxx Adds GPU memory reporting and additional acc wait synchronization points; disables neighbor rebuild.
utils.hpp Adds number formatting helper for log output.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread knn.hpp
Comment on lines 5 to +15
class KNN
{
public:
KNN(const Param& param, const array_t& points_vec_, NANOKDTree& nano_kdtree_,
double resoTimes = 3);
KNN(const Param& param, const array_t& points_vec_, NANOKDTree& nano_kdtree_, bool is_msg_ = true,
int capacity = -1);
~KNN();

void search(const array_t& queries, neighbor_vec& neighbors, const int nquery,
int k, double resoTimes = 3);
// Search for k nearest neighbors.
neighbor* search(const array_t& queries, const int nquery, const int k_neig,
bool is_sync_to_host = true, const float* d_guess_radii_sq = nullptr);

Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

knn.hpp relies on types like Param, array_t, NANOKDTree, and neighbor without including the headers (or forward declarations) that define them. This makes the header non-self-contained and brittle to include-order changes; add the necessary includes (or forward declare where possible) so #include "knn.hpp" compiles on its own.

Copilot uses AI. Check for mistakes.
Comment thread utils.hpp
Comment thread knn.cxx
Comment thread array2d.hpp
Comment on lines +204 to +221
#ifdef ACC
void pack_to_xyz_float(std::vector<float3>& buffer, std::size_t limit_size = 0) const {
std::size_t count = (limit_size > 0 && limit_size <= n_) ? limit_size : n_;

if (buffer.size() < count)
buffer.resize(count);

#pragma acc parallel loop gang vector
for (std::size_t i = 0; i < count; ++i) {
buffer[i].x = (float)(*this)[i][0];
buffer[i].y = (float)(*this)[i][1];
#ifdef THREED
buffer[i].z = (float)(*this)[i][2];
#else
buffer[i].z = 0.0;
#endif
}
}
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pack_to_xyz_float() uses the float3 type, but this header does not define or include a definition for float3. With current include order (array2d.hpp is included before knn_bvh.hpp), this will fail to compile in ACC builds. Consider defining a project-local POD type for 3 floats in a common header, or include the header that defines float3 before this method is compiled.

Copilot uses AI. Check for mistakes.
Comment thread Makefile
Comment on lines 491 to +493

$(KNN_BVH_LIB):
$(MAKE) -C $(KNN_BVH_DIR) NDIM=$(ndims)
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$(KNN_BVH_LIB) is only set when openacc=1, but this unconditional rule still expands to a target even when the variable is empty (effectively creating a rule for :). Wrap this rule in the same ifeq ($(openacc), 1) guard (or give KNN_BVH_LIB a safe default) so non-OpenACC builds don't get an invalid/unintended make rule.

Suggested change
$(KNN_BVH_LIB):
$(MAKE) -C $(KNN_BVH_DIR) NDIM=$(ndims)
ifeq ($(openacc), 1)
$(KNN_BVH_LIB):
$(MAKE) -C $(KNN_BVH_DIR) NDIM=$(ndims)
endif

Copilot uses AI. Check for mistakes.
Comment thread Makefile Outdated
Comment thread markerset.cxx Outdated
Comment thread nn-interpolation.cxx Outdated
@chaseshyu chaseshyu linked an issue Apr 16, 2026 that may be closed by this pull request
@chaseshyu chaseshyu force-pushed the update/GPU-knn-bvh-search branch 2 times, most recently from 1e7b1f9 to 8de695c Compare April 16, 2026 18:00
@chaseshyu chaseshyu force-pushed the update/GPU-knn-bvh-search branch 3 times, most recently from d9d918c to 78280c7 Compare April 16, 2026 19:28
…move redundant environment variable in nvc-build workflow
…cker build to improve build performance during PR
@chaseshyu chaseshyu force-pushed the update/GPU-knn-bvh-search branch from 78280c7 to de19b62 Compare April 16, 2026 19:44
@chaseshyu chaseshyu force-pushed the update/GPU-knn-bvh-search branch from a21485f to 468c6d0 Compare April 16, 2026 20:03
@chaseshyu chaseshyu force-pushed the update/GPU-knn-bvh-search branch from a08a12d to a6a8f78 Compare April 16, 2026 20:28
@tan2 tan2 merged commit cd1dac4 into master Apr 22, 2026
15 checks passed
@tan2 tan2 deleted the update/GPU-knn-bvh-search branch April 22, 2026 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfixes Fix bugs enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Attributes initialization order OpenACC or acceleration on GPU

3 participants