Skip to content

Convert binary index Hamming callers to dynamic dispatch (#5071)#5071

Open
algoriddle wants to merge 2 commits intofacebookresearch:mainfrom
algoriddle:export-D100020358
Open

Convert binary index Hamming callers to dynamic dispatch (#5071)#5071
algoriddle wants to merge 2 commits intofacebookresearch:mainfrom
algoriddle:export-D100020358

Conversation

@algoriddle
Copy link
Copy Markdown
Contributor

@algoriddle algoriddle commented Apr 9, 2026

Summary:

Phase 2 of the Hamming DD conversion. Converts 5 external callers of
dispatch_HammingComputer to use per-ISA TUs so the correct
HammingComputer structs (AVX2/NEON/generic) are used at runtime.

Previously, these files compiled as common TUs with no ISA flags,
causing hamdis-inl.h to fall through to generic-inl.h — scalar
HammingComputer structs even on AVX2-capable machines in DD mode.

Converted files:

  • IndexBinaryHNSW.cpp: get_distance_computer() factory
  • IndexBinaryIVF.cpp: get_InvertedListScanner() + search_preassigned()
  • IndexBinaryHash.cpp: 4 search entry points (knn+range × 2 index types)
  • IndexIVFSpectralHash.cpp: get_InvertedListScanner() factory
  • IndexPQ.cpp: polysemous_inner_loop dispatch

NOT converted: IndexIVFPQ.cpp polysemous path — deeply interleaved
with PQ distance computation inside IVFPQScannerT, niche (off by
default, polysemous_ht=0). See ~/ivfpq_polysemous_dd.md for analysis.

Differential Revision: D100020358

@meta-cla meta-cla Bot added the CLA Signed label Apr 9, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Apr 9, 2026

@algoriddle has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100020358.

algoriddle added a commit to algoriddle/faiss that referenced this pull request Apr 10, 2026
…earch#5071)

Summary:

Phase 2 of the Hamming DD conversion. Converts 5 external callers of
dispatch_HammingComputer to use per-ISA TUs so the correct
HammingComputer structs (AVX2/NEON/generic) are used at runtime.

Previously, these files compiled as common TUs with no ISA flags,
causing hamdis-inl.h to fall through to generic-inl.h — scalar
HammingComputer structs even on AVX2-capable machines in DD mode.

Converted files:
- IndexBinaryHNSW.cpp: get_distance_computer() factory
- IndexBinaryIVF.cpp: get_InvertedListScanner() + search_preassigned()
- IndexBinaryHash.cpp: 4 search entry points (knn+range × 2 index types)
- IndexIVFSpectralHash.cpp: get_InvertedListScanner() factory
- IndexPQ.cpp: polysemous_inner_loop dispatch

NOT converted: IndexIVFPQ.cpp polysemous path — deeply interleaved
with PQ distance computation inside IVFPQScannerT, niche (off by
default, polysemous_ht=0). See ~/ivfpq_polysemous_dd.md for analysis.

Differential Revision: D100020358
@meta-codesync meta-codesync Bot changed the title Convert binary index Hamming callers to dynamic dispatch Convert binary index Hamming callers to dynamic dispatch (#5071) Apr 10, 2026
algoriddle added a commit to algoriddle/faiss that referenced this pull request Apr 21, 2026
…earch#5071)

Summary:
Pull Request resolved: facebookresearch#5071

Phase 2 of the Hamming DD conversion. Converts 5 external callers of
dispatch_HammingComputer to use per-ISA TUs so the correct
HammingComputer structs (AVX2/NEON/generic) are used at runtime.

Previously, these files compiled as common TUs with no ISA flags,
causing hamdis-inl.h to fall through to generic-inl.h — scalar
HammingComputer structs even on AVX2-capable machines in DD mode.

Converted files:
- IndexBinaryHNSW.cpp: get_distance_computer() factory
- IndexBinaryIVF.cpp: get_InvertedListScanner() + search_preassigned()
- IndexBinaryHash.cpp: 4 search entry points (knn+range × 2 index types)
- IndexIVFSpectralHash.cpp: get_InvertedListScanner() factory
- IndexPQ.cpp: polysemous_inner_loop dispatch

NOT yet converted: IndexIVFPQ.cpp polysemous path

Differential Revision: D100020358
algoriddle added a commit to algoriddle/faiss that referenced this pull request Apr 21, 2026
…earch#5071)

Summary:
Pull Request resolved: facebookresearch#5071

Phase 2 of the Hamming DD conversion. Converts 5 external callers of
dispatch_HammingComputer to use per-ISA TUs so the correct
HammingComputer structs (AVX2/NEON/generic) are used at runtime.

Previously, these files compiled as common TUs with no ISA flags,
causing hamdis-inl.h to fall through to generic-inl.h — scalar
HammingComputer structs even on AVX2-capable machines in DD mode.

Converted files:
- IndexBinaryHNSW.cpp: get_distance_computer() factory
- IndexBinaryIVF.cpp: get_InvertedListScanner() + search_preassigned()
- IndexBinaryHash.cpp: 4 search entry points (knn+range × 2 index types)
- IndexIVFSpectralHash.cpp: get_InvertedListScanner() factory
- IndexPQ.cpp: polysemous_inner_loop dispatch

NOT yet converted: IndexIVFPQ.cpp polysemous path

Differential Revision: D100020358
algoriddle added a commit to algoriddle/faiss that referenced this pull request Apr 21, 2026
…earch#5071)

Summary:
Pull Request resolved: facebookresearch#5071

Phase 2 of the Hamming DD conversion. Converts 5 external callers of
dispatch_HammingComputer to use per-ISA TUs so the correct
HammingComputer structs (AVX2/NEON/generic) are used at runtime.

Previously, these files compiled as common TUs with no ISA flags,
causing hamdis-inl.h to fall through to generic-inl.h — scalar
HammingComputer structs even on AVX2-capable machines in DD mode.

Converted files:
- IndexBinaryHNSW.cpp: get_distance_computer() factory
- IndexBinaryIVF.cpp: get_InvertedListScanner() + search_preassigned()
- IndexBinaryHash.cpp: 4 search entry points (knn+range × 2 index types)
- IndexIVFSpectralHash.cpp: get_InvertedListScanner() factory
- IndexPQ.cpp: polysemous_inner_loop dispatch

NOT yet converted: IndexIVFPQ.cpp polysemous path

Differential Revision: D100020358
algoriddle added a commit to algoriddle/faiss that referenced this pull request Apr 21, 2026
…earch#5071)

Summary:
Pull Request resolved: facebookresearch#5071

Phase 2 of the Hamming DD conversion. Converts 5 external callers of
dispatch_HammingComputer to use per-ISA TUs so the correct
HammingComputer structs (AVX2/NEON/generic) are used at runtime.

Previously, these files compiled as common TUs with no ISA flags,
causing hamdis-inl.h to fall through to generic-inl.h — scalar
HammingComputer structs even on AVX2-capable machines in DD mode.

Converted files:
- IndexBinaryHNSW.cpp: get_distance_computer() factory
- IndexBinaryIVF.cpp: get_InvertedListScanner() + search_preassigned()
- IndexBinaryHash.cpp: 4 search entry points (knn+range × 2 index types)
- IndexIVFSpectralHash.cpp: get_InvertedListScanner() factory
- IndexPQ.cpp: polysemous_inner_loop dispatch

NOT yet converted: IndexIVFPQ.cpp polysemous path

Differential Revision: D100020358
@meta-codesync meta-codesync Bot changed the title Convert binary index Hamming callers to dynamic dispatch (#5071) Convert binary index Hamming callers to dynamic dispatch Apr 23, 2026
algoriddle added a commit to algoriddle/faiss that referenced this pull request Apr 24, 2026
…earch#5071)

Summary:
Pull Request resolved: facebookresearch#5071

Phase 2 of the Hamming DD conversion. Converts 5 external callers of
dispatch_HammingComputer to use per-ISA TUs so the correct
HammingComputer structs (AVX2/NEON/generic) are used at runtime.

Previously, these files compiled as common TUs with no ISA flags,
causing hamdis-inl.h to fall through to generic-inl.h — scalar
HammingComputer structs even on AVX2-capable machines in DD mode.

Converted files:
- IndexBinaryHNSW.cpp: get_distance_computer() factory
- IndexBinaryIVF.cpp: get_InvertedListScanner() + search_preassigned()
- IndexBinaryHash.cpp: 4 search entry points (knn+range × 2 index types)
- IndexIVFSpectralHash.cpp: get_InvertedListScanner() factory
- IndexPQ.cpp: polysemous_inner_loop dispatch

NOT converted: IndexIVFPQ.cpp polysemous path — deeply interleaved
with PQ distance computation inside IVFPQScannerT, niche (off by
default, polysemous_ht=0). See ~/ivfpq_polysemous_dd.md for analysis.

Differential Revision: D100020358
@meta-codesync meta-codesync Bot changed the title Convert binary index Hamming callers to dynamic dispatch Convert binary index Hamming callers to dynamic dispatch (#5071) Apr 24, 2026
algoriddle added a commit to algoriddle/faiss that referenced this pull request Apr 24, 2026
…earch#5071)

Summary:
Pull Request resolved: facebookresearch#5071

Phase 2 of the Hamming DD conversion. Converts 5 external callers of
dispatch_HammingComputer to use per-ISA TUs so the correct
HammingComputer structs (AVX2/NEON/generic) are used at runtime.

Previously, these files compiled as common TUs with no ISA flags,
causing hamdis-inl.h to fall through to generic-inl.h — scalar
HammingComputer structs even on AVX2-capable machines in DD mode.

Converted files:
- IndexBinaryHNSW.cpp: get_distance_computer() factory
- IndexBinaryIVF.cpp: get_InvertedListScanner() + search_preassigned()
- IndexBinaryHash.cpp: 4 search entry points (knn+range × 2 index types)
- IndexIVFSpectralHash.cpp: get_InvertedListScanner() factory
- IndexPQ.cpp: polysemous_inner_loop dispatch

NOT converted: IndexIVFPQ.cpp polysemous path — deeply interleaved
with PQ distance computation inside IVFPQScannerT, niche (off by
default, polysemous_ht=0). See ~/ivfpq_polysemous_dd.md for analysis.

Differential Revision: D100020358
Summary:
Convert all 8 public Hamming distance functions in `hamming.cpp` to dynamic
dispatch so they use the best available SIMD implementation at runtime
instead of always running at scalar speed in DD mode.

This integrates Matthijs's refactor (D100004339): deduplicates the
HammingComputer code across the 4 per-ISA backend files into a shared
`common.h`. It is Phase 1 of the Hamming DD conversion; Phase 2 (future
diff) will migrate external callers that use `dispatch_HammingComputer`
directly.

Key changes:

- **DD dispatch**: the 8 public hamming.cpp functions now dispatch via
  `with_simd_level` (mask A0: NONE/AVX2/AVX512/ARM_NEON) to SL-templated
  `*_fixSL<SL>` entry points.

- **`_fixSL` suffix** on SL-templated entry points (per Matthijs's
  D100020358 convention), distinguishing them from the non-templated DD
  entry points that share the same base name. Avoids silent-dispatch-vs-
  pinned-specialization confusion at call sites.

- **New `hamming_avx512.cpp` TU** — makes the pre-existing AVX-512
  HammingComputer specializations in `avx512-inl.h` reachable from Buck
  DD builds (previously reachable only from OSS CMake `faiss_avx512` /
  `faiss_avx512_spr` targets). The AVX-512 TU gives SKX/CLX/Ice Lake
  clients ZMM auto-vectorization of the hamming kernels; the
  `_mm512_popcnt_epi64` fast path remains gated on VPOPCNTDQ (active
  in OSS `faiss_avx512_spr` CMake target only, per pre-existing design).

- **ODR fix via SIMDLevel templatization**: the 9 ISA-varying
  `HammingComputer*` / `GenHammingComputer*` structs are now primary
  templates `*_tpl<SIMDLevel>` with scalar bodies in `common.h`, and
  per-ISA specializations in the respective `-inl.h` files. Each TU
  picks its specialization via a `using`-alias in hamdis-inl.h keyed
  on `FAISS_HAMMING_SL`. Distinct mangled names per SL eliminate the
  latent ODR hazard where multiple DD TUs with different flags saw
  differently-laid-out structs sharing the `faiss::HammingComputer*`
  name. Verified by disassembly: distinct `<SIMDLevel)0|1|2>` suffixes
  per TU.

- **Silent OOB fix in `HammingComputer20::set`** (inherited from
  D100004339): pre-diff `uint64_t a[2]` read 8 bytes on a 20-byte code
  (past-end read, tripped by ASan); post-diff `uint32_t a32[4]` reads
  exactly 4 bytes.

- **`faiss_hamming_sl_collapse()` constexpr** in hamdis-inl.h makes the
  AVX512_SPR→AVX512 / ARM_SVE→ARM_NEON collapse explicit at the alias
  level, rather than relying on the raw arch-macro ladder happening to
  match. Applies only to HammingComputer aliases; does not change the
  dispatcher.

- **popcount.h extraction**: moved `popcount32` / `popcount64` out of
  `common.h` into a standalone `faiss/utils/popcount.h` so callers outside
  the hamming subtree (rabitq_simd.h, rabitq_avx2/512.cpp,
  partitioning_simdlib256.h, PolysemousTraining.cpp) have a clean
  include. Migrated all `__builtin_popcount*` call sites. Silent Windows
  correctness improvement: pre-diff `__builtin_popcountl` on `uint64_t`
  truncated to 32 bits on MSVC (LLP64, `long` is 32-bit); post-diff
  `popcount64` always dispatches to `__popcnt64`. The pre-diff
  `__builtin_popcount → __popcnt` macro shim in `platform_macros.h` is
  removed — no longer needed now that all call sites use the helpers.

- **`FAISS_MAYBE_UNUSED` on Windows**: previously defined only in the
  Linux/OSX branch of platform_macros.h; added to the MSVC branch so the
  templated HammingComputer `set(..., int code_size)` ctors compile
  cleanly on Windows under `-Wunused-parameter` + NDEBUG.

Differential Revision: D99994593
algoriddle added a commit to algoriddle/faiss that referenced this pull request Apr 24, 2026
…earch#5071)

Summary:
Pull Request resolved: facebookresearch#5071

Phase 2 of the Hamming DD conversion. Converts 5 external callers of
dispatch_HammingComputer to use per-ISA TUs so the correct
HammingComputer structs (AVX2/NEON/generic) are used at runtime.

Previously, these files compiled as common TUs with no ISA flags,
causing hamdis-inl.h to fall through to generic-inl.h — scalar
HammingComputer structs even on AVX2-capable machines in DD mode.

Converted files:
- IndexBinaryHNSW.cpp: get_distance_computer() factory
- IndexBinaryIVF.cpp: get_InvertedListScanner() + search_preassigned()
- IndexBinaryHash.cpp: 4 search entry points (knn+range × 2 index types)
- IndexIVFSpectralHash.cpp: get_InvertedListScanner() factory
- IndexPQ.cpp: polysemous_inner_loop dispatch

NOT converted: IndexIVFPQ.cpp polysemous path — deeply interleaved
with PQ distance computation inside IVFPQScannerT, niche (off by
default, polysemous_ht=0). See ~/ivfpq_polysemous_dd.md for analysis.

Differential Revision: D100020358
algoriddle added a commit to algoriddle/faiss that referenced this pull request Apr 24, 2026
…earch#5071)

Summary:
Pull Request resolved: facebookresearch#5071

Phase 2 of the Hamming DD conversion. Converts 5 external callers of
dispatch_HammingComputer to use per-ISA TUs so the correct
HammingComputer structs (AVX2/NEON/generic) are used at runtime.

Previously, these files compiled as common TUs with no ISA flags,
causing hamdis-inl.h to fall through to generic-inl.h — scalar
HammingComputer structs even on AVX2-capable machines in DD mode.

Converted files:
- IndexBinaryHNSW.cpp: get_distance_computer() factory
- IndexBinaryIVF.cpp: get_InvertedListScanner() + search_preassigned()
- IndexBinaryHash.cpp: 4 search entry points (knn+range × 2 index types)
- IndexIVFSpectralHash.cpp: get_InvertedListScanner() factory
- IndexPQ.cpp: polysemous_inner_loop dispatch

NOT converted: IndexIVFPQ.cpp polysemous path — deeply interleaved
with PQ distance computation inside IVFPQScannerT, niche (off by
default, polysemous_ht=0). See ~/ivfpq_polysemous_dd.md for analysis.

Differential Revision: D100020358
…earch#5071)

Summary:
Pull Request resolved: facebookresearch#5071

Phase 2 of the Hamming DD conversion. Converts 5 external callers of
dispatch_HammingComputer to use per-ISA TUs so the correct
HammingComputer structs (AVX2/NEON/generic) are used at runtime.

Previously, these files compiled as common TUs with no ISA flags,
causing hamdis-inl.h to fall through to generic-inl.h — scalar
HammingComputer structs even on AVX2-capable machines in DD mode.

Converted files:
- IndexBinaryHNSW.cpp: get_distance_computer() factory
- IndexBinaryIVF.cpp: get_InvertedListScanner() + search_preassigned()
- IndexBinaryHash.cpp: 4 search entry points (knn+range × 2 index types)
- IndexIVFSpectralHash.cpp: get_InvertedListScanner() factory
- IndexPQ.cpp: polysemous_inner_loop dispatch

NOT converted: IndexIVFPQ.cpp polysemous path — deeply interleaved
with PQ distance computation inside IVFPQScannerT, niche (off by
default, polysemous_ht=0). See ~/ivfpq_polysemous_dd.md for analysis.

Differential Revision: D100020358
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants