Dev/uncertainty#2121
Conversation
There was a problem hiding this comment.
Pull request overview
This PR replaces the previous external UncertaintyTE-based uncertainty computation with a new in-repo aliceVision_uncertainty CUDA/MAGMA implementation, wiring it into aliceVision_computeUncertainty and adding a Meshroom node.
Changes:
- Add new
aliceVision_uncertaintylibrary (CUDA kernels + MAGMA-based Hessian inversion) and a CUDA unit test for kernels. - Update
aliceVision_computeUncertaintyto build a BA Jacobian via Ceres and call the new uncertainty API. - Remove UncertaintyTE build plumbing and add MAGMA discovery in top-level CMake.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
src/software/utils/main_computeUncertainty.cpp |
Switched to new uncertainty library + Jacobian generation; CLI pared down. |
src/software/utils/CMakeLists.txt |
Link aliceVision_computeUncertainty against new aliceVision_uncertainty. |
src/cmake/FindUncertaintyTE.cmake |
Removed UncertaintyTE find-module. |
src/cmake/ConfigureOptionsMerger.cmake |
Removed UncertaintyTE configure option forwarding. |
src/aliceVision/utils/scope.hpp |
New ScopeGuard helper used for cleanup in uncertainty code. |
src/aliceVision/uncertainty/uncertainty.hpp |
New public uncertainty API + Schur/Hessian construction templates. |
src/aliceVision/uncertainty/uncertainty.cpp |
MAGMA-based camera Hessian inverse via Neumann-series correction. |
src/aliceVision/uncertainty/uncertainty_kernels.* |
CUDA reduction + conditional AXPY used for GPU-side convergence guard. |
src/aliceVision/uncertainty/uncertainty_kernels_test.cpp |
New Boost test coverage for CUDA kernels. |
src/aliceVision/uncertainty/magma.hpp |
RAII wrapper for MAGMA device allocations. |
src/aliceVision/uncertainty/CMakeLists.txt |
Build new uncertainty library and unit test. |
src/aliceVision/sfm/bundle/BundleAdjustmentCeres.{hpp,cpp} |
Change Jacobian creation API to return pose/landmark positioning info. |
src/aliceVision/CMakeLists.txt |
Add uncertainty/ subdir when CUDA is available. |
src/CMakeLists.txt |
Add MAGMA dependency discovery; remove UncertaintyTE option & include dirs. |
meshroom/aliceVision/ComputeUncertainty.py |
Add Meshroom node wrapper for the CLI tool. |
INSTALL.md |
Update dependency/options documentation to remove UncertaintyTE references. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| SOURCES ${uncertainty_files_headers} ${uncertainty_files_sources} | ||
| PUBLIC_LINKS | ||
| aliceVision_sfmData | ||
| ${MAGMA_LIBRARIES} | ||
| PRIVATE_LINKS | ||
| aliceVision_system | ||
| CUDA::cudart |
There was a problem hiding this comment.
aliceVision_uncertainty compiles CUDA sources (uncertainty_kernels.cu) but the target isn't created with USE_CUDA and CUDA::cudart isn't a PUBLIC link. This can cause missing CUDA target properties and (for static libs) unresolved cudart symbols in downstream targets. Consider adding USE_CUDA and making CUDA::cudart a PUBLIC link, consistent with other CUDA libraries in this repo.
| SOURCES ${uncertainty_files_headers} ${uncertainty_files_sources} | |
| PUBLIC_LINKS | |
| aliceVision_sfmData | |
| ${MAGMA_LIBRARIES} | |
| PRIVATE_LINKS | |
| aliceVision_system | |
| CUDA::cudart | |
| USE_CUDA | |
| SOURCES ${uncertainty_files_headers} ${uncertainty_files_sources} | |
| PUBLIC_LINKS | |
| aliceVision_sfmData | |
| ${MAGMA_LIBRARIES} | |
| CUDA::cudart | |
| PRIVATE_LINKS | |
| aliceVision_system |
| // command-line parameters | ||
| std::string verboseLevel = system::EVerboseLevel_enumToString(system::Logger::getDefaultVerboseLevel()); | ||
| std::string sfmDataFilename; |
There was a problem hiding this comment.
verboseLevel is initialized but not registered as a CLI option anymore, yet the Meshroom node passes verboseLevel. Add the option back (and apply it) or remove it from Meshroom to prevent runtime failures.
| // Conditional axpy kernel | ||
| // --------------------------------------------------------------------------- | ||
| // Thread 0 in block 0 decides whether to execute the axpy. | ||
| // The decision is broadcast via shared memory so all threads in the grid | ||
| // read it from L1 cache rather than global memory. | ||
| // | ||
| // Layout: one kernel launch with enough blocks to cover n elements. | ||
| // Block 0 additionally runs the comparison and update logic in thread 0. | ||
|
|
||
| __global__ void conditionalAxpyKernel(double k, | ||
| const double* __restrict__ src, | ||
| double* __restrict__ dst, | ||
| const double* d_cur_maxabs, | ||
| double* d_prev_norm, | ||
| double lambda, | ||
| int n) | ||
| { | ||
| // Block 0, thread 0: compare and update d_prev_norm | ||
| __shared__ int do_axpy; | ||
| if (blockIdx.x == 0 && threadIdx.x == 0) | ||
| { | ||
| double cur = lambda * (*d_cur_maxabs); | ||
| do_axpy = (cur <= *d_prev_norm) ? 1 : 0; | ||
| if (do_axpy) | ||
| { | ||
| *d_prev_norm = cur; | ||
| } | ||
| } | ||
|
|
||
| // All threads in block 0 wait for the decision; other blocks read it | ||
| // from global memory via the shared flag after block 0 has written it. | ||
| // Since blocks can execute in any order we use a device-side broadcast: | ||
| // block 0 writes do_axpy to d_cur_maxabs[1] (we repurpose a spare slot) | ||
| // — but that requires an extra device buffer. | ||
| // | ||
| // Simpler correct approach: re-read d_prev_norm comparison in every block. | ||
| // (One extra global read per block, negligible vs the axpy bandwidth.) | ||
| __syncthreads(); // sync within block 0 | ||
|
|
||
| // Non-block-0 threads recompute the flag from global memory | ||
| int local_do_axpy; | ||
| if (blockIdx.x == 0) | ||
| { | ||
| local_do_axpy = do_axpy; | ||
| } | ||
| else | ||
| { | ||
| // d_prev_norm was updated by block 0 only if do_axpy; check the | ||
| // pre-update value by comparing directly (re-read is safe: block 0 | ||
| // has already written d_prev_norm before this block can read it | ||
| // because grids execute sequentially per stream). | ||
| local_do_axpy = (lambda * (*d_cur_maxabs) <= *d_prev_norm) ? 1 : 0; | ||
| } | ||
|
|
||
| if (!local_do_axpy) return; | ||
|
|
There was a problem hiding this comment.
Within a single kernel launch, blocks can execute in any order; using block 0/thread 0 to decide and update *d_prev_norm does not guarantee other blocks see the updated value before they compute local_do_axpy. This can make some blocks execute the axpy when it should be skipped. Consider a separate decision kernel that writes a device flag, then a second kernel for the axpy.
| // Conditional axpy kernel | |
| // --------------------------------------------------------------------------- | |
| // Thread 0 in block 0 decides whether to execute the axpy. | |
| // The decision is broadcast via shared memory so all threads in the grid | |
| // read it from L1 cache rather than global memory. | |
| // | |
| // Layout: one kernel launch with enough blocks to cover n elements. | |
| // Block 0 additionally runs the comparison and update logic in thread 0. | |
| __global__ void conditionalAxpyKernel(double k, | |
| const double* __restrict__ src, | |
| double* __restrict__ dst, | |
| const double* d_cur_maxabs, | |
| double* d_prev_norm, | |
| double lambda, | |
| int n) | |
| { | |
| // Block 0, thread 0: compare and update d_prev_norm | |
| __shared__ int do_axpy; | |
| if (blockIdx.x == 0 && threadIdx.x == 0) | |
| { | |
| double cur = lambda * (*d_cur_maxabs); | |
| do_axpy = (cur <= *d_prev_norm) ? 1 : 0; | |
| if (do_axpy) | |
| { | |
| *d_prev_norm = cur; | |
| } | |
| } | |
| // All threads in block 0 wait for the decision; other blocks read it | |
| // from global memory via the shared flag after block 0 has written it. | |
| // Since blocks can execute in any order we use a device-side broadcast: | |
| // block 0 writes do_axpy to d_cur_maxabs[1] (we repurpose a spare slot) | |
| // — but that requires an extra device buffer. | |
| // | |
| // Simpler correct approach: re-read d_prev_norm comparison in every block. | |
| // (One extra global read per block, negligible vs the axpy bandwidth.) | |
| __syncthreads(); // sync within block 0 | |
| // Non-block-0 threads recompute the flag from global memory | |
| int local_do_axpy; | |
| if (blockIdx.x == 0) | |
| { | |
| local_do_axpy = do_axpy; | |
| } | |
| else | |
| { | |
| // d_prev_norm was updated by block 0 only if do_axpy; check the | |
| // pre-update value by comparing directly (re-read is safe: block 0 | |
| // has already written d_prev_norm before this block can read it | |
| // because grids execute sequentially per stream). | |
| local_do_axpy = (lambda * (*d_cur_maxabs) <= *d_prev_norm) ? 1 : 0; | |
| } | |
| if (!local_do_axpy) return; | |
| // Conditional axpy kernels | |
| // --------------------------------------------------------------------------- | |
| // The decision whether to execute the axpy must be made in a separate kernel, | |
| // because blocks within a single kernel launch cannot synchronize with each | |
| // other. Launch decideConditionalAxpyKernel<<<1, 1, 0, stream>>> first, then | |
| // launch conditionalAxpyKernel<<<grid, block, 0, stream>>> in the same stream. | |
| __global__ void decideConditionalAxpyKernel(const double* d_cur_maxabs, | |
| double* d_prev_norm, | |
| double lambda, | |
| int* d_do_axpy) | |
| { | |
| if (blockIdx.x == 0 && threadIdx.x == 0) | |
| { | |
| const double cur = lambda * (*d_cur_maxabs); | |
| const int do_axpy = (cur <= *d_prev_norm) ? 1 : 0; | |
| *d_do_axpy = do_axpy; | |
| if (do_axpy) | |
| { | |
| *d_prev_norm = cur; | |
| } | |
| } | |
| } | |
| __global__ void conditionalAxpyKernel(double k, | |
| const double* __restrict__ src, | |
| double* __restrict__ dst, | |
| const int* d_do_axpy, | |
| int n) | |
| { | |
| if (*d_do_axpy == 0) return; |
| magma_queue_create(err, &queue); | ||
| if (err) | ||
| { | ||
| ALICEVISION_LOG_ERROR("Error on magma_magma_queue_createinit: " << magma_strerror(err)); | ||
| return false; |
There was a problem hiding this comment.
magma_queue_create is called without capturing its return status, and the error check reuses 'err' from magma_init. Capture the return value from magma_queue_create (and use an explicit device id, typically 0) so failures are correctly detected and logged.
c666653 to
fd58e5a
Compare
Co-authored-by: Copilot <copilot@github.com>
fd58e5a to
b089f54
Compare
|



Description
Features list
Implementation remarks