Fix NEON intrinsics build on aarch64 with GCC#70
Fix NEON intrinsics build on aarch64 with GCC#70Varun-Nair wants to merge 1 commit intofacebookresearch:mainfrom
Conversation
|
Hi @Varun-Nair! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
Ocean's ARM NEON code compiles under Clang (Android/iOS) but fails under GCC on aarch64 Linux servers (e.g., NVIDIA GH200 Grace Hopper). Three categories of fix across 6 files: 1. constexpr on NEON vector types (47 occurrences, 5 files) GCC does not support constexpr on NEON types like uint8x16_t (Clang extension). Replace with const. 2. Signed/unsigned shift mismatch in FrameConverter.cpp (6 occurrences) vrshrq_n_s16() called on unsigned data from vrhaddq_u16(). Replace with vrshrq_n_u16(). 3. Wrong lane accessor types (6 occurrences, 2 files) vget_low_u8/vget_high_u8 called on uint16x8_t, int8x16_t, and int16x8_t arguments. Replace with type-matched accessors. All changes are mechanical and do not alter runtime behavior. Tested on NVIDIA GH200 (aarch64, GCC 11, Ubuntu 22.04). Successfully builds projectaria_tools _core_pybinds.so and extracts VRS data end-to-end.
dd1f9a1 to
638e76d
Compare
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
Summary
Fix Ocean's ARM NEON SIMD code to compile under GCC on aarch64 Linux servers (e.g., NVIDIA GH200 Grace Hopper). Currently only compiles under Clang. This blocks building
projectaria_toolson ARM servers. No existing issues document this.Changes
Three categories of fix across 7 files in
impl/ocean/cv/:constexpron NEON typesconstexpr→constvrshrq_n_s16()on unsigned data_s16→_u16vget_low_u8()on wrong input typesTotal: +67 -67 across 7 files.
Motivation
GH200 and similar ARM server GPUs are entering ML research clusters. Any egocentric vision pipeline depending on
projectaria_tools(EgoLifter, HaWoR, EgoGaussian) fails to build on these servers. x86 servers skip NEON paths; Clang on ARM accepts these constructs; GCC on ARM servers is the untested combination.Testing
Verified on NVIDIA GH200 (aarch64, GCC 11, Ubuntu 22.04):
projectaria_tools_core_pybinds.sobuilds successfullyChecklist
Additional Comments
All fixes are local to
impl/ocean/cv/. Theconstexpr→constchange is safe because GCC still optimizes these into immediate loads. The signed/unsigned shift fix is also a latent correctness improvement (unsigned data should use unsigned shift).