Convert binary index Hamming callers to dynamic dispatch (#5071)#5071
Open
algoriddle wants to merge 2 commits intofacebookresearch:mainfrom
Open
Convert binary index Hamming callers to dynamic dispatch (#5071)#5071algoriddle wants to merge 2 commits intofacebookresearch:mainfrom
algoriddle wants to merge 2 commits intofacebookresearch:mainfrom
Conversation
Contributor
|
@algoriddle has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100020358. |
algoriddle
added a commit
to algoriddle/faiss
that referenced
this pull request
Apr 10, 2026
…earch#5071) Summary: Phase 2 of the Hamming DD conversion. Converts 5 external callers of dispatch_HammingComputer to use per-ISA TUs so the correct HammingComputer structs (AVX2/NEON/generic) are used at runtime. Previously, these files compiled as common TUs with no ISA flags, causing hamdis-inl.h to fall through to generic-inl.h — scalar HammingComputer structs even on AVX2-capable machines in DD mode. Converted files: - IndexBinaryHNSW.cpp: get_distance_computer() factory - IndexBinaryIVF.cpp: get_InvertedListScanner() + search_preassigned() - IndexBinaryHash.cpp: 4 search entry points (knn+range × 2 index types) - IndexIVFSpectralHash.cpp: get_InvertedListScanner() factory - IndexPQ.cpp: polysemous_inner_loop dispatch NOT converted: IndexIVFPQ.cpp polysemous path — deeply interleaved with PQ distance computation inside IVFPQScannerT, niche (off by default, polysemous_ht=0). See ~/ivfpq_polysemous_dd.md for analysis. Differential Revision: D100020358
cdd97db to
79d54ee
Compare
algoriddle
added a commit
to algoriddle/faiss
that referenced
this pull request
Apr 21, 2026
…earch#5071) Summary: Pull Request resolved: facebookresearch#5071 Phase 2 of the Hamming DD conversion. Converts 5 external callers of dispatch_HammingComputer to use per-ISA TUs so the correct HammingComputer structs (AVX2/NEON/generic) are used at runtime. Previously, these files compiled as common TUs with no ISA flags, causing hamdis-inl.h to fall through to generic-inl.h — scalar HammingComputer structs even on AVX2-capable machines in DD mode. Converted files: - IndexBinaryHNSW.cpp: get_distance_computer() factory - IndexBinaryIVF.cpp: get_InvertedListScanner() + search_preassigned() - IndexBinaryHash.cpp: 4 search entry points (knn+range × 2 index types) - IndexIVFSpectralHash.cpp: get_InvertedListScanner() factory - IndexPQ.cpp: polysemous_inner_loop dispatch NOT yet converted: IndexIVFPQ.cpp polysemous path Differential Revision: D100020358
79d54ee to
198fe0d
Compare
algoriddle
added a commit
to algoriddle/faiss
that referenced
this pull request
Apr 21, 2026
…earch#5071) Summary: Pull Request resolved: facebookresearch#5071 Phase 2 of the Hamming DD conversion. Converts 5 external callers of dispatch_HammingComputer to use per-ISA TUs so the correct HammingComputer structs (AVX2/NEON/generic) are used at runtime. Previously, these files compiled as common TUs with no ISA flags, causing hamdis-inl.h to fall through to generic-inl.h — scalar HammingComputer structs even on AVX2-capable machines in DD mode. Converted files: - IndexBinaryHNSW.cpp: get_distance_computer() factory - IndexBinaryIVF.cpp: get_InvertedListScanner() + search_preassigned() - IndexBinaryHash.cpp: 4 search entry points (knn+range × 2 index types) - IndexIVFSpectralHash.cpp: get_InvertedListScanner() factory - IndexPQ.cpp: polysemous_inner_loop dispatch NOT yet converted: IndexIVFPQ.cpp polysemous path Differential Revision: D100020358
198fe0d to
276c6bb
Compare
algoriddle
added a commit
to algoriddle/faiss
that referenced
this pull request
Apr 21, 2026
…earch#5071) Summary: Pull Request resolved: facebookresearch#5071 Phase 2 of the Hamming DD conversion. Converts 5 external callers of dispatch_HammingComputer to use per-ISA TUs so the correct HammingComputer structs (AVX2/NEON/generic) are used at runtime. Previously, these files compiled as common TUs with no ISA flags, causing hamdis-inl.h to fall through to generic-inl.h — scalar HammingComputer structs even on AVX2-capable machines in DD mode. Converted files: - IndexBinaryHNSW.cpp: get_distance_computer() factory - IndexBinaryIVF.cpp: get_InvertedListScanner() + search_preassigned() - IndexBinaryHash.cpp: 4 search entry points (knn+range × 2 index types) - IndexIVFSpectralHash.cpp: get_InvertedListScanner() factory - IndexPQ.cpp: polysemous_inner_loop dispatch NOT yet converted: IndexIVFPQ.cpp polysemous path Differential Revision: D100020358
276c6bb to
e7d2ddb
Compare
algoriddle
added a commit
to algoriddle/faiss
that referenced
this pull request
Apr 21, 2026
…earch#5071) Summary: Pull Request resolved: facebookresearch#5071 Phase 2 of the Hamming DD conversion. Converts 5 external callers of dispatch_HammingComputer to use per-ISA TUs so the correct HammingComputer structs (AVX2/NEON/generic) are used at runtime. Previously, these files compiled as common TUs with no ISA flags, causing hamdis-inl.h to fall through to generic-inl.h — scalar HammingComputer structs even on AVX2-capable machines in DD mode. Converted files: - IndexBinaryHNSW.cpp: get_distance_computer() factory - IndexBinaryIVF.cpp: get_InvertedListScanner() + search_preassigned() - IndexBinaryHash.cpp: 4 search entry points (knn+range × 2 index types) - IndexIVFSpectralHash.cpp: get_InvertedListScanner() factory - IndexPQ.cpp: polysemous_inner_loop dispatch NOT yet converted: IndexIVFPQ.cpp polysemous path Differential Revision: D100020358
e7d2ddb to
5b3e230
Compare
5b3e230 to
9058596
Compare
algoriddle
added a commit
to algoriddle/faiss
that referenced
this pull request
Apr 24, 2026
…earch#5071) Summary: Pull Request resolved: facebookresearch#5071 Phase 2 of the Hamming DD conversion. Converts 5 external callers of dispatch_HammingComputer to use per-ISA TUs so the correct HammingComputer structs (AVX2/NEON/generic) are used at runtime. Previously, these files compiled as common TUs with no ISA flags, causing hamdis-inl.h to fall through to generic-inl.h — scalar HammingComputer structs even on AVX2-capable machines in DD mode. Converted files: - IndexBinaryHNSW.cpp: get_distance_computer() factory - IndexBinaryIVF.cpp: get_InvertedListScanner() + search_preassigned() - IndexBinaryHash.cpp: 4 search entry points (knn+range × 2 index types) - IndexIVFSpectralHash.cpp: get_InvertedListScanner() factory - IndexPQ.cpp: polysemous_inner_loop dispatch NOT converted: IndexIVFPQ.cpp polysemous path — deeply interleaved with PQ distance computation inside IVFPQScannerT, niche (off by default, polysemous_ht=0). See ~/ivfpq_polysemous_dd.md for analysis. Differential Revision: D100020358
9058596 to
2e892c6
Compare
algoriddle
added a commit
to algoriddle/faiss
that referenced
this pull request
Apr 24, 2026
…earch#5071) Summary: Pull Request resolved: facebookresearch#5071 Phase 2 of the Hamming DD conversion. Converts 5 external callers of dispatch_HammingComputer to use per-ISA TUs so the correct HammingComputer structs (AVX2/NEON/generic) are used at runtime. Previously, these files compiled as common TUs with no ISA flags, causing hamdis-inl.h to fall through to generic-inl.h — scalar HammingComputer structs even on AVX2-capable machines in DD mode. Converted files: - IndexBinaryHNSW.cpp: get_distance_computer() factory - IndexBinaryIVF.cpp: get_InvertedListScanner() + search_preassigned() - IndexBinaryHash.cpp: 4 search entry points (knn+range × 2 index types) - IndexIVFSpectralHash.cpp: get_InvertedListScanner() factory - IndexPQ.cpp: polysemous_inner_loop dispatch NOT converted: IndexIVFPQ.cpp polysemous path — deeply interleaved with PQ distance computation inside IVFPQScannerT, niche (off by default, polysemous_ht=0). See ~/ivfpq_polysemous_dd.md for analysis. Differential Revision: D100020358
2e892c6 to
e5a1656
Compare
Summary: Convert all 8 public Hamming distance functions in `hamming.cpp` to dynamic dispatch so they use the best available SIMD implementation at runtime instead of always running at scalar speed in DD mode. This integrates Matthijs's refactor (D100004339): deduplicates the HammingComputer code across the 4 per-ISA backend files into a shared `common.h`. It is Phase 1 of the Hamming DD conversion; Phase 2 (future diff) will migrate external callers that use `dispatch_HammingComputer` directly. Key changes: - **DD dispatch**: the 8 public hamming.cpp functions now dispatch via `with_simd_level` (mask A0: NONE/AVX2/AVX512/ARM_NEON) to SL-templated `*_fixSL<SL>` entry points. - **`_fixSL` suffix** on SL-templated entry points (per Matthijs's D100020358 convention), distinguishing them from the non-templated DD entry points that share the same base name. Avoids silent-dispatch-vs- pinned-specialization confusion at call sites. - **New `hamming_avx512.cpp` TU** — makes the pre-existing AVX-512 HammingComputer specializations in `avx512-inl.h` reachable from Buck DD builds (previously reachable only from OSS CMake `faiss_avx512` / `faiss_avx512_spr` targets). The AVX-512 TU gives SKX/CLX/Ice Lake clients ZMM auto-vectorization of the hamming kernels; the `_mm512_popcnt_epi64` fast path remains gated on VPOPCNTDQ (active in OSS `faiss_avx512_spr` CMake target only, per pre-existing design). - **ODR fix via SIMDLevel templatization**: the 9 ISA-varying `HammingComputer*` / `GenHammingComputer*` structs are now primary templates `*_tpl<SIMDLevel>` with scalar bodies in `common.h`, and per-ISA specializations in the respective `-inl.h` files. Each TU picks its specialization via a `using`-alias in hamdis-inl.h keyed on `FAISS_HAMMING_SL`. Distinct mangled names per SL eliminate the latent ODR hazard where multiple DD TUs with different flags saw differently-laid-out structs sharing the `faiss::HammingComputer*` name. Verified by disassembly: distinct `<SIMDLevel)0|1|2>` suffixes per TU. - **Silent OOB fix in `HammingComputer20::set`** (inherited from D100004339): pre-diff `uint64_t a[2]` read 8 bytes on a 20-byte code (past-end read, tripped by ASan); post-diff `uint32_t a32[4]` reads exactly 4 bytes. - **`faiss_hamming_sl_collapse()` constexpr** in hamdis-inl.h makes the AVX512_SPR→AVX512 / ARM_SVE→ARM_NEON collapse explicit at the alias level, rather than relying on the raw arch-macro ladder happening to match. Applies only to HammingComputer aliases; does not change the dispatcher. - **popcount.h extraction**: moved `popcount32` / `popcount64` out of `common.h` into a standalone `faiss/utils/popcount.h` so callers outside the hamming subtree (rabitq_simd.h, rabitq_avx2/512.cpp, partitioning_simdlib256.h, PolysemousTraining.cpp) have a clean include. Migrated all `__builtin_popcount*` call sites. Silent Windows correctness improvement: pre-diff `__builtin_popcountl` on `uint64_t` truncated to 32 bits on MSVC (LLP64, `long` is 32-bit); post-diff `popcount64` always dispatches to `__popcnt64`. The pre-diff `__builtin_popcount → __popcnt` macro shim in `platform_macros.h` is removed — no longer needed now that all call sites use the helpers. - **`FAISS_MAYBE_UNUSED` on Windows**: previously defined only in the Linux/OSX branch of platform_macros.h; added to the MSVC branch so the templated HammingComputer `set(..., int code_size)` ctors compile cleanly on Windows under `-Wunused-parameter` + NDEBUG. Differential Revision: D99994593
algoriddle
added a commit
to algoriddle/faiss
that referenced
this pull request
Apr 24, 2026
…earch#5071) Summary: Pull Request resolved: facebookresearch#5071 Phase 2 of the Hamming DD conversion. Converts 5 external callers of dispatch_HammingComputer to use per-ISA TUs so the correct HammingComputer structs (AVX2/NEON/generic) are used at runtime. Previously, these files compiled as common TUs with no ISA flags, causing hamdis-inl.h to fall through to generic-inl.h — scalar HammingComputer structs even on AVX2-capable machines in DD mode. Converted files: - IndexBinaryHNSW.cpp: get_distance_computer() factory - IndexBinaryIVF.cpp: get_InvertedListScanner() + search_preassigned() - IndexBinaryHash.cpp: 4 search entry points (knn+range × 2 index types) - IndexIVFSpectralHash.cpp: get_InvertedListScanner() factory - IndexPQ.cpp: polysemous_inner_loop dispatch NOT converted: IndexIVFPQ.cpp polysemous path — deeply interleaved with PQ distance computation inside IVFPQScannerT, niche (off by default, polysemous_ht=0). See ~/ivfpq_polysemous_dd.md for analysis. Differential Revision: D100020358
e5a1656 to
2243dc1
Compare
algoriddle
added a commit
to algoriddle/faiss
that referenced
this pull request
Apr 24, 2026
…earch#5071) Summary: Pull Request resolved: facebookresearch#5071 Phase 2 of the Hamming DD conversion. Converts 5 external callers of dispatch_HammingComputer to use per-ISA TUs so the correct HammingComputer structs (AVX2/NEON/generic) are used at runtime. Previously, these files compiled as common TUs with no ISA flags, causing hamdis-inl.h to fall through to generic-inl.h — scalar HammingComputer structs even on AVX2-capable machines in DD mode. Converted files: - IndexBinaryHNSW.cpp: get_distance_computer() factory - IndexBinaryIVF.cpp: get_InvertedListScanner() + search_preassigned() - IndexBinaryHash.cpp: 4 search entry points (knn+range × 2 index types) - IndexIVFSpectralHash.cpp: get_InvertedListScanner() factory - IndexPQ.cpp: polysemous_inner_loop dispatch NOT converted: IndexIVFPQ.cpp polysemous path — deeply interleaved with PQ distance computation inside IVFPQScannerT, niche (off by default, polysemous_ht=0). See ~/ivfpq_polysemous_dd.md for analysis. Differential Revision: D100020358
2243dc1 to
7740f1a
Compare
…earch#5071) Summary: Pull Request resolved: facebookresearch#5071 Phase 2 of the Hamming DD conversion. Converts 5 external callers of dispatch_HammingComputer to use per-ISA TUs so the correct HammingComputer structs (AVX2/NEON/generic) are used at runtime. Previously, these files compiled as common TUs with no ISA flags, causing hamdis-inl.h to fall through to generic-inl.h — scalar HammingComputer structs even on AVX2-capable machines in DD mode. Converted files: - IndexBinaryHNSW.cpp: get_distance_computer() factory - IndexBinaryIVF.cpp: get_InvertedListScanner() + search_preassigned() - IndexBinaryHash.cpp: 4 search entry points (knn+range × 2 index types) - IndexIVFSpectralHash.cpp: get_InvertedListScanner() factory - IndexPQ.cpp: polysemous_inner_loop dispatch NOT converted: IndexIVFPQ.cpp polysemous path — deeply interleaved with PQ distance computation inside IVFPQScannerT, niche (off by default, polysemous_ht=0). See ~/ivfpq_polysemous_dd.md for analysis. Differential Revision: D100020358
7740f1a to
ffb142f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Phase 2 of the Hamming DD conversion. Converts 5 external callers of
dispatch_HammingComputer to use per-ISA TUs so the correct
HammingComputer structs (AVX2/NEON/generic) are used at runtime.
Previously, these files compiled as common TUs with no ISA flags,
causing hamdis-inl.h to fall through to generic-inl.h — scalar
HammingComputer structs even on AVX2-capable machines in DD mode.
Converted files:
NOT converted: IndexIVFPQ.cpp polysemous path — deeply interleaved
with PQ distance computation inside IVFPQScannerT, niche (off by
default, polysemous_ht=0). See ~/ivfpq_polysemous_dd.md for analysis.
Differential Revision: D100020358