Conversation
Signed-off-by: Charlie Huang <charliehuang09@gmail.com>
There was a problem hiding this comment.
Pull request overview
This PR refactors the third_party/971apriltag GPU AprilTag pipeline by renaming the public namespace/API (frc971::apriltag → frc::apriltag), restructuring the CUDA build (moving from .cu sources to .cc compiled as CUDA), and introducing new abstractions/types for thresholding and sizing.
Changes:
- Replaced the old
971apriltag.h/apriltag.c-style interface with a new C++ wrapper headerapriltag.hand updated downstream includes/usages. - Introduced a
Thresholdinterface (threshold.h+threshold.cc) and updated the detector to use it (including a--use_neonflag path). - Updated CUDA utilities (
cuda.h/.cc) and build rules (third_party/971apriltag/CMakeLists.txt) to support the new compilation/linking structure.
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| third_party/971apriltag/transform_output_iterator.h | Include guard + namespace rename to frc::apriltag. |
| third_party/971apriltag/threshold.h | New thresholding interface + vision::ImageFormat + factory declarations. |
| third_party/971apriltag/threshold.cu | Removed legacy CUDA threshold implementation. |
| third_party/971apriltag/threshold.cc | New templated threshold implementation compiled as CUDA. |
| third_party/971apriltag/postscript_utils.h | Removed debug postscript rendering helper. |
| third_party/971apriltag/points.h | Include guard + namespace rename; changed blob limit constant. |
| third_party/971apriltag/points.cc | Namespace rename and formatting adjustments. |
| third_party/971apriltag/line_fit_filter.h | Include guard + namespace rename; changed numeric types/precision. |
| third_party/971apriltag/line_fit_filter.cc | Namespace rename + formatting + minor logic cleanup. |
| third_party/971apriltag/labeling_allegretti_2019_BKE.h | Include guard + modernized integer includes. |
| third_party/971apriltag/labeling_allegretti_2019_BKE.cc | Switched to absl CHECK/LOG and apriltag types. |
| third_party/971apriltag/gpu_image.h | Include guard + include <cstddef>. |
| third_party/971apriltag/cuda.h | Namespace rename + new memory helpers and event/stream additions. |
| third_party/971apriltag/cuda.cc | Added --sync flag and CudaStream::Wait. |
| third_party/971apriltag/apriltag_types.h | New apriltag_size_t typedef for GPU code. |
| third_party/971apriltag/apriltag_detect.cc | Updated decoding flow + distortion model updates + namespace rename. |
| third_party/971apriltag/apriltag.h | New public GPU detector wrapper API. |
| third_party/971apriltag/apriltag.cc | Updated detector implementation to use new thresholding and memory/event structure. |
| third_party/971apriltag/apriltag.c | Removed legacy C implementation. |
| third_party/971apriltag/CMakeLists.txt | Reworked build to compile .cc as CUDA and split dependencies. |
| third_party/971apriltag/971apriltag.h | Removed old header/API. |
| src/utils/constants_from_json.cc | Updated JSON conversions to new frc::apriltag types. |
| src/localization/nvidia_apriltag_detector.h | Removed unused include of deleted 971apriltag header. |
| src/localization/gpu_apriltag_detector.h | Updated include and detector type to new namespace/API. |
| src/localization/gpu_apriltag_detector.cc | Updated construction + call sites to new GpuDetector::Detect API. |
| src/localization/CMakeLists.txt | Commented-out alternative library line. |
Comments suppressed due to low confidence (1)
third_party/971apriltag/cuda.h:121
HostMemory::MemcpyTotakesconst T* otheras the destination and passes it tomemcpy, which requires a non-const destination pointer. This should beT* other(and ideallyMemcpyTo(void* dst, size_t bytes)/span overloads) so the header compiles and the API matches intent.
// Copies data to other (host memory) from this's memory.
void MemcpyTo(const T* other) {
memcpy(other, span_.data(), sizeof(T) * size());
}
| frc::apriltag::DistCoeffs distortion_coefficients = {.k1 = intrinsics["k1"], | ||
| .k2 = intrinsics["k2"], | ||
| .p1 = intrinsics["p1"], | ||
| .p2 = intrinsics["p2"], | ||
| .k3 = intrinsics["k3"]}; | ||
|
|
There was a problem hiding this comment.
DistCoeffs now includes k4..k6 and num_params, but the JSON conversion only initializes k1..k3,p1,p2. The new fields will be left uninitialized and are used by UnDistort/ReDistort, leading to undefined behavior. Initialize the remaining fields (e.g. to 0) and set num_params based on what the JSON provides.
| frc::apriltag::DistCoeffs distortion_coefficients = {.k1 = intrinsics["k1"], | |
| .k2 = intrinsics["k2"], | |
| .p1 = intrinsics["p1"], | |
| .p2 = intrinsics["p2"], | |
| .k3 = intrinsics["k3"]}; | |
| frc::apriltag::DistCoeffs distortion_coefficients{}; // Zero-initialize all fields, including k4..k6 and num_params. | |
| // Required coefficients: assume these are always provided in the JSON. | |
| distortion_coefficients.k1 = intrinsics["k1"]; | |
| distortion_coefficients.k2 = intrinsics["k2"]; | |
| distortion_coefficients.p1 = intrinsics["p1"]; | |
| distortion_coefficients.p2 = intrinsics["p2"]; | |
| distortion_coefficients.k3 = intrinsics["k3"]; | |
| // Start with the 5 standard distortion parameters. | |
| int num_params = 5; | |
| // Optional higher-order radial distortion coefficients. | |
| if (intrinsics.contains("k4")) { | |
| distortion_coefficients.k4 = intrinsics["k4"]; | |
| ++num_params; | |
| } | |
| if (intrinsics.contains("k5")) { | |
| distortion_coefficients.k5 = intrinsics["k5"]; | |
| ++num_params; | |
| } | |
| if (intrinsics.contains("k6")) { | |
| distortion_coefficients.k6 = intrinsics["k6"]; | |
| ++num_params; | |
| } | |
| distortion_coefficients.num_params = num_params; |
| add_library(971apriltag_threshold_dep STATIC threshold.cc) | ||
| target_link_libraries(971apriltag_threshold_dep PRIVATE apriltag absl::log absl::check) | ||
|
|
||
| add_library(971apriltag STATIC | ||
| apriltag.cc | ||
| apriltag_detect.cc | ||
| labeling_allegretti_2019_BKE.cc | ||
| line_fit_filter.cc | ||
| points.cc | ||
| ) | ||
| set_target_properties(971apriltag_dep PROPERTIES LINKER_LANGUAGE C) | ||
|
|
||
| target_compile_options(971apriltag PRIVATE -w) | ||
| target_link_libraries(971apriltag PRIVATE apriltag 971apriltag_dep absl::log absl::check) | ||
| target_link_libraries(971apriltag PRIVATE apriltag 971apriltag_cuda_dep 971apriltag_threshold_dep absl::log absl::check) | ||
| set_source_files_properties(labeling_allegretti_2019_BKE.cc apriltag_detect.cc apriltag.cc line_fit_filter.cc points.cc PROPERTIES LANGUAGE CUDA) |
There was a problem hiding this comment.
971apriltag_threshold_dep uses symbols from cuda.cc (e.g. MaybeCheckAndSynchronize), but it does not link against 971apriltag_cuda_dep, and the final target_link_libraries(971apriltag ...) lists 971apriltag_cuda_dep before 971apriltag_threshold_dep. With static libraries, link order matters on many toolchains and this can lead to unresolved symbols. Link 971apriltag_threshold_dep against 971apriltag_cuda_dep (or reorder / use an OBJECT library) to make linking robust.
| @@ -0,0 +1,349 @@ | |||
| #include "threshold.h" | |||
|
|
|||
| #include <stdint.h> | |||
There was a problem hiding this comment.
This file uses std::min/std::max but doesn't include <algorithm>. It may compile accidentally due to transitive includes, but that's not guaranteed; explicitly include <algorithm> to make the TU self-contained.
| #include <stdint.h> | |
| #include <stdint.h> | |
| #include <algorithm> // For std::min/std::max |
| CHECK_CUDA(cudaMemcpyAsync(memory_, host_memory, | ||
| sizeof(T) * host_memory->size(), | ||
| cudaMemcpyHostToDevice, stream->get())); | ||
| } | ||
| void MemcpyAsyncFrom(const HostMemory<T>* host_memory, const size_t size, | ||
| CudaStream* stream) { | ||
| CHECK_LE(size, size_); | ||
| CHECK_CUDA(cudaMemcpyAsync(memory_, host_memory, sizeof(T) * size, |
There was a problem hiding this comment.
GpuMemory::MemcpyAsyncFrom(const HostMemory<T>* ...) passes the HostMemory object pointer directly to cudaMemcpyAsync instead of host_memory->get(), so it will copy from the wrong address and likely crash/corrupt data. Use host_memory->get() (and apply the same fix to the sized overload).
| CHECK_CUDA(cudaMemcpyAsync(memory_, host_memory, | |
| sizeof(T) * host_memory->size(), | |
| cudaMemcpyHostToDevice, stream->get())); | |
| } | |
| void MemcpyAsyncFrom(const HostMemory<T>* host_memory, const size_t size, | |
| CudaStream* stream) { | |
| CHECK_LE(size, size_); | |
| CHECK_CUDA(cudaMemcpyAsync(memory_, host_memory, sizeof(T) * size, | |
| CHECK_CUDA(cudaMemcpyAsync(memory_, host_memory->get(), | |
| sizeof(T) * host_memory->size(), | |
| cudaMemcpyHostToDevice, stream->get())); | |
| } | |
| void MemcpyAsyncFrom(const HostMemory<T>* host_memory, const size_t size, | |
| CudaStream* stream) { | |
| CHECK_LE(size, size_); | |
| CHECK_CUDA(cudaMemcpyAsync(memory_, host_memory->get(), sizeof(T) * size, |
| #include <cassert> | ||
| #include <chrono> | ||
| #include <iostream> | ||
| #include <span> |
There was a problem hiding this comment.
This header uses memcpy and std::vector but does not include <cstring> or <vector>. Relying on transitive includes is brittle and can break compilation depending on include order; add the missing standard headers here.
| #include <span> | |
| #include <span> | |
| #include <cstring> // For memcpy | |
| #include <vector> // For std::vector |
| // The blob index is 12 bits, the angle is 28 bits, and the point is 24 bits. | ||
| struct IndexPoint { | ||
| // Max number of blob IDs we can hold. | ||
| static constexpr size_t kMaxBlobs = 4096; //2048; RJS | ||
| static constexpr size_t kMaxBlobs = 2048; | ||
|
|
There was a problem hiding this comment.
IndexPoint stores a 12-bit blob index (blob_index & 0xfff), and the comment states the blob index is 12 bits, but kMaxBlobs was reduced to 2048 (11 bits). Either keep kMaxBlobs aligned with the 12-bit encoding (4096) or update the encoding/comment to avoid silent truncation/invalid assumptions elsewhere.
| class GpuDetector { | ||
| public: | ||
| // The number of blobs we will consider when counting april tags. | ||
| static constexpr size_t kMaxBlobs = 64; // TODO Charlie | ||
|
|
There was a problem hiding this comment.
GpuDetector::kMaxBlobs is set to 64, but selected_extents_device_, peak_extents_device_, and fit_quads_device_ are sized based on this constant while num_quads is computed from the image at runtime. There are no checks ensuring num_quads <= kMaxBlobs, so an image with >64 qualifying blobs can cause device memory overwrites. Add a hard check before using num_quads, or allocate these buffers based on the computed count.
| gpu_detector_ = std::make_unique<frc::apriltag::GpuDetector>( | ||
| image_width, image_height, apriltag_detector_, | ||
| utils::CameraMatrixFromJson<frc971::apriltag::CameraMatrix>(intrinsics), | ||
| utils::DistortionCoefficientsFromJson<frc971::apriltag::DistCoeffs>( | ||
| intrinsics)); | ||
| utils::CameraMatrixFromJson<frc::apriltag::CameraMatrix>(intrinsics), | ||
| utils::DistortionCoefficientsFromJson<frc::apriltag::DistCoeffs>( | ||
| intrinsics), | ||
| vision::ImageFormat::BGR8); | ||
| } | ||
| auto GPUAprilTagDetector::GetTagDetections( | ||
| camera::timestamped_frame_t& timestamped_frame) | ||
| -> std::vector<tag_detection_t> { | ||
| try { | ||
| if (timestamped_frame.frame.channels() == 1) { | ||
| gpu_detector_->DetectGrayHost( | ||
| (unsigned char*)timestamped_frame.frame.ptr()); | ||
| gpu_detector_->Detect((unsigned char*)timestamped_frame.frame.ptr(), | ||
| nullptr); | ||
| } else if (timestamped_frame.frame.channels() == 3) { | ||
| cv::Mat gray; | ||
| cv::cvtColor(timestamped_frame.frame, gray, cv::COLOR_BGR2GRAY); | ||
| gpu_detector_->DetectGrayHost((unsigned char*)gray.ptr()); | ||
| gpu_detector_->Detect((unsigned char*)gray.ptr(), nullptr); |
There was a problem hiding this comment.
The detector is constructed with vision::ImageFormat::BGR8, but both call sites pass a single-channel grayscale buffer to GpuDetector::Detect. The thresholding code will treat the input as 3 bytes/pixel and read out-of-bounds / produce incorrect results. Either pass the original BGR frame when using BGR8, or construct the detector with MONO8 (and skip the extra cvtColor).
| #include <stdint.h> | ||
|
|
||
| #include "apriltag_types.h" | ||
| #include "cuda.h" | ||
| // #include "vision_generated.h" | ||
|
|
There was a problem hiding this comment.
threshold.h declares std::unique_ptr but does not include <memory>, which will fail to compile in translation units that don't already include it. Add #include <memory> (and consider including <cstddef> explicitly if you want to rely less on transitive includes for size_t).
| const u_int32_t in_row = out_row * 2; | ||
| const u_int32_t in_col = out_col * 2; |
There was a problem hiding this comment.
InternalCudaToGreyscaleAndDecimate uses u_int32_t for in_row/in_col, which is non-standard and not guaranteed to be defined on all platforms/toolchains. Use uint32_t or apriltag_size_t here for portability/consistency.
| const u_int32_t in_row = out_row * 2; | |
| const u_int32_t in_col = out_col * 2; | |
| const apriltag_size_t in_row = out_row * 2; | |
| const apriltag_size_t in_col = out_col * 2; |
Signed-off-by: Charlie Huang <charliehuang09@gmail.com>
Signed-off-by: Charlie Huang <charliehuang09@gmail.com>
Signed-off-by: Charlie Huang <charliehuang09@gmail.com>
Signed-off-by: Charlie Huang <charliehuang09@gmail.com>
Signed-off-by: Charlie Huang <charliehuang09@gmail.com>
No description provided.