-
Notifications
You must be signed in to change notification settings - Fork 184
JIT LTO Cagra Search #1807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
divyegala
wants to merge
245
commits into
rapidsai:main
Choose a base branch
from
divyegala:cagra-search-jit-lto
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
JIT LTO Cagra Search #1807
Changes from all commits
Commits
Show all changes
245 commits
Select commit
Hold shift + click to select a range
a024f61
jit lto interleaved scan
divyegala 45da4aa
fix dependencies.yaml
divyegala a7c8621
generate files at build time, use tags to avoid compilation of types
divyegala eb2d74b
passing tests
divyegala d2318e8
update gitignore
divyegala 5e6afcd
separate out distance function from main kernel
divyegala 6eee4da
fix deps
divyegala 1de8f28
add filters as jit device functions, rework caching logic
divyegala 84c6020
lto post lambda, cleanup files, generate cmake in build dir
divyegala 22680c8
don't read hardcoded kernels, use generator properly
divyegala 37f1163
random cmake changes carried over from 25.10
divyegala 0ae5383
cmake format
divyegala fe56aec
remove dep on kernel list
divyegala 40c8fd6
attempt to solve overlinking problem
divyegala e87a8c7
reorder if-else in compiler check
divyegala 179d733
Merge branch 'branch-25.12' into jit-lto-ivf-flat-interleaved
divyegala 32a67bd
use cudart apis
divyegala c27612e
merge
divyegala a4b48b1
attempt to link cudart
divyegala d5d692e
revert cudart link, try all arch build of jit lto fatbin sources
divyegala 1c6dd94
cmake format
divyegala 30f5ab6
missing shared mem setting
divyegala 9674969
separate cuda 12 and 13 compilation
divyegala 24fc47d
merge upstream
divyegala db9a487
remove bench
divyegala aa9294f
c include directory
divyegala 2eb77fe
style check
divyegala 6c685fa
merge upstream
divyegala 3e35b99
guard cuda calls and use shared_ptr
divyegala d0ff62c
add AlgorithmPlanner to main target
divyegala eb87577
merge upstream
divyegala 445a6c4
remove nvjitlink as cuda 12 dep
divyegala 92a27d4
address review
divyegala 8549172
merge upstream
divyegala 67579f4
add include guard
divyegala 7ad8774
add and remove couple of comments
divyegala 816a480
merge upstream
divyegala ab35ef3
delete readme
divyegala cdd4c85
increase warmup time
divyegala 87334b2
merge upstream
divyegala c1eff9f
use new copyright
divyegala ece09b8
new copyright
divyegala 4dacc6e
remove one more straggling comment
divyegala 1fd95cd
use raft expects
divyegala 64cde0d
Merge branch 'main' into jit-lto-ivf-flat-interleaved
divyegala 5ac127b
merge upstream
divyegala 78002c6
address review
divyegala 9ad6a0b
pre-commit
divyegala bf4c4ad
address review
divyegala 18b2af9
Generate kernel files in CMake instead of Python
KyleFromNVIDIA ece5cad
Merge remote-tracking branch 'refs/remotes/github/divyegala/jit-lto-i…
KyleFromNVIDIA 8ce70c2
Style
KyleFromNVIDIA fdc4239
Style
KyleFromNVIDIA be3cf0d
Style
KyleFromNVIDIA 7e644c3
Lint
KyleFromNVIDIA 235938a
Style, lint
KyleFromNVIDIA e3b749d
Fix nvjitlink_checker
KyleFromNVIDIA f42ae3f
Style
KyleFromNVIDIA b606df9
Merge branch 'main' into jit-lto-ivf-flat-interleaved
KyleFromNVIDIA 5ce7aab
Refactor JIT LTO kernel compilation
KyleFromNVIDIA eaad347
Style
KyleFromNVIDIA eb3b468
pic
KyleFromNVIDIA 912279c
style
KyleFromNVIDIA 19f1af3
Verbose build
KyleFromNVIDIA 087b943
static
KyleFromNVIDIA c16e109
style
KyleFromNVIDIA 323b79f
TARGET_OBJECTS
KyleFromNVIDIA 9f13e73
Disable sccache
KyleFromNVIDIA eaf9d39
Recache
KyleFromNVIDIA ce40c51
Revert CI debugging
KyleFromNVIDIA 0d0abb9
Install and link object library
KyleFromNVIDIA 84bfa92
Style
KyleFromNVIDIA 21241eb
Alias
KyleFromNVIDIA 7c0ac13
Make cuvs_jit_lto_kernels a static library
KyleFromNVIDIA 880dbf2
Style
KyleFromNVIDIA d04d7c1
rapids_cuda_init_architectures() for C tests
KyleFromNVIDIA 19581f9
Be more specific about where we search for libclang
KyleFromNVIDIA a61f019
More libclang updates
KyleFromNVIDIA 2eeb913
Revert "Fix libclang download for Rust, CUDA initialization for C tests"
KyleFromNVIDIA 55ec26c
Merge branch 'main' into jit-lto-ivf-flat-interleaved
KyleFromNVIDIA 10228c5
Merge branch 'main' into jit-lto-ivf-flat-interleaved
KyleFromNVIDIA 031ce21
Merge branch 'main' into jit-lto-ivf-flat-interleaved
KyleFromNVIDIA 088c21e
Copyright
KyleFromNVIDIA 8ca1062
Apply suggestions from code review
divyegala d5ab5bf
merge upstream
divyegala b8c0d42
address some review comments
divyegala 17d34ae
remove too many underscores
divyegala 45a5146
FEA Add initial commit of prototype/pseudo-code for proposed UDF APIs…
dantegd 447532e
stitch together
divyegala e1627d1
add udf to cmakelists
divyegala f7ea581
udfs working e2e
divyegala 8b2775c
run benchmarks
divyegala e9c77d9
working through
divyegala adcfb8f
fixed overhead
divyegala 282b376
Simplify
KyleFromNVIDIA 609a4d6
Merge branch 'main' into jit-lto-ivf-flat-interleaved
KyleFromNVIDIA 3115d07
address reviews
divyegala bb524ae
Merge remote-tracking branch 'origin/main' into jit-lto-ivf-flat-inte…
divyegala 30a8a9f
Merge branch 'jit-lto-ivf-flat-interleaved' of github.com:divyegala/c…
divyegala 72ddb36
Merge branch 'main' into jit-lto-ivf-flat-interleaved
divyegala 4bd2102
add to docs and log about jit
divyegala fb722f0
Merge branch 'jit-lto-ivf-flat-interleaved' of github.com:divyegala/c…
divyegala 3523b96
Merge remote-tracking branch 'origin/main' into jit-lto-ivf-flat-inte…
divyegala ba758a2
address review
divyegala 42b78ae
rename inner_product to inner_prod
divyegala 2e3a471
Merge remote-tracking branch 'origin/main' into jit-lto-ivf-flat-inte…
divyegala bfc6c09
fix merge conflict
divyegala f6377fa
include header and form better log
divyegala 26abc7b
Merge branch 'jit-lto-ivf-flat-interleaved' into ivf-flat-search-udf
divyegala fb7f105
merge
divyegala 432bb32
working through
divyegala 533b770
address review and move
divyegala af23585
Merge remote-tracking branch 'origin/main' into jit-lto-ivf-flat-inte…
divyegala 78c59d9
one more fix
divyegala 9274868
Merge branch 'jit-lto-ivf-flat-interleaved' into cagra-search-jit-lto
divyegala 7f8802b
correct path
divyegala f432aad
Merge branch 'jit-lto-ivf-flat-interleaved' into cagra-search-jit-lto
divyegala 39ce9e3
in the middle of stuff
divyegala 27acbb6
merge upstream
divyegala d11edfd
Merge branch 'jit-lto-ivf-flat-interleaved' into ivf-flat-search-udf
divyegala dd23671
multi-cta still failing
divyegala 4f287c1
attempting to solve 2 kernel issue
divyegala 64f6ad8
merge upstream
divyegala f1888a2
more cleaning
divyegala b596e79
merge cleanly
divyegala 9c4980f
add nvrtc as a dependency
divyegala f27eeb2
fix build errors
divyegala bc5c90e
guard udf use
divyegala 09dc56c
analyzing cubins
divyegala 55c32f4
compiler definition on headers
divyegala 1866475
guard udf test
divyegala c419173
remove
divyegala 04cc166
missing include
divyegala 1113afc
cleaning up
divyegala e372917
merge upstream
divyegala d8341ac
Merge remote-tracking branch 'divye/unneeded-cccl-includes' into cagr…
divyegala 6feecce
most errors resolved
divyegala 3e9f5f3
Merge branch 'main' into ivf-flat-search-udf
divyegala 52e05c2
debug filter fragment
divyegala caf8d03
Merge branch 'main' into ivf-flat-search-udf
divyegala b65f599
occassional failure on dgx spark
divyegala 5239a1a
fix compile
divyegala 736dc75
Ignore cache-host run exports
bdice f83f595
Merge branch 'main' into ivf-flat-search-udf
divyegala a7a4ef7
pull out metric
divyegala 5390c4c
Merge branch 'cagra-search-jit-lto' of github.com:divyegala/cuvs into…
divyegala 07a158c
use void* for desc and create more fragments
divyegala 0e201e8
attempt to fix cuda 12 builds
divyegala 88a4b6e
respond to reviews
divyegala 101c5ee
Merge remote-tracking branch 'origin/main' into ivf-flat-search-udf
divyegala 5d3a9df
Merge branch 'ivf-flat-search-udf' of github.com:divyegala/cuvs into …
divyegala 63c7300
pin cupy to <14.0 for cuda 12 wheels
divyegala 0c0b6b5
fix cuda 12
divyegala faa9339
add includes
divyegala 73e8fa0
fix logging
divyegala fef68d3
fix macro
divyegala 05cc149
major refactor to reduce # of fragments
divyegala b6c9031
merge upstream udf pr
divyegala 995f998
Merge branch 'main' into ivf-flat-search-udf
divyegala 75e2616
Account for different QueryT
divyegala 387d9ea
Merge remote-tracking branch 'origin/main' into cagra-search-jit-lto
divyegala 1ccb01c
cleanup some stuff
divyegala 3256a8e
attempt to fix devcontainer error
divyegala 32a5d9f
Merge remote-tracking branch 'origin/main' into ivf-flat-search-udf
divyegala 592af70
Merge branch 'ivf-flat-search-udf' of github.com:divyegala/cuvs into …
divyegala 43501b7
address review comments
divyegala b5342d6
Merge branch 'ivf-flat-search-udf' into cagra-search-jit-lto
divyegala b85f16b
Add matrix JSON files
KyleFromNVIDIA e79de08
Merge branch 'main' into cagra-search-jit-lto
KyleFromNVIDIA de0a2b5
Fix
KyleFromNVIDIA c7909c3
more refactors and fix stream serialization bug
divyegala bbbfb25
launch correctly
divyegala 22c40fd
Use new kernel matrix system
KyleFromNVIDIA d404869
remove debug prints
divyegala 53ce0aa
Merge remote-tracking branch 'github/divyegala/cagra-search-jit-lto' …
KyleFromNVIDIA 9fc9185
Merge remote-tracking branch 'github/divyegala/cagra-search-jit-lto' …
KyleFromNVIDIA 1eef8c5
Remove preprocessor branch
KyleFromNVIDIA 0af09e2
Merge branch 'main' into cagra-search-jit-lto
KyleFromNVIDIA b2e418b
reconcile pr 1807 and add nvjitlink/nvrtc to jit target
divyegala 53195ef
Fix ivf flat
KyleFromNVIDIA f589b26
Fix kernel names and matrices
KyleFromNVIDIA 6b8d175
Fix query
KyleFromNVIDIA 426625e
Fix another query
KyleFromNVIDIA 97dfa18
More
KyleFromNVIDIA 29881c8
Make naming and matrices more consistent
KyleFromNVIDIA bb01ec6
add func specialization for smem launcher
divyegala 6b32331
Merge branch 'cagra-search-jit-lto' of github.com:divyegala/cuvs into…
divyegala 6516f78
fix ivf flat udf key
divyegala d737706
remove debug
divyegala a809041
Remove comments and debug statement, fix query, copyright
KyleFromNVIDIA 0d48be2
Merge remote-tracking branch 'github/divyegala/cagra-search-jit-lto' …
KyleFromNVIDIA 49f999f
missing query tag
divyegala d66edf0
Merge branch 'cagra-search-jit-lto' of github.com:divyegala/cuvs into…
divyegala b52f8c2
Refactor and make thread-safe
KyleFromNVIDIA 0349746
remove prints
divyegala 6e07abb
remove unnecessary includes
divyegala e9e2ff0
Don't build fatbins with debug symbols
KyleFromNVIDIA 9bd6100
Merge branch 'main' into cagra-search-jit-lto
KyleFromNVIDIA 34ed3e2
Merge branch 'main' into cagra-search-jit-lto
divyegala e6f06fc
Merge branch 'main' into cagra-search-jit-lto
KyleFromNVIDIA 5552b2f
Merge remote-tracking branch 'github/divyegala/cagra-search-jit-lto' …
KyleFromNVIDIA 582d6a0
unpin raft
divyegala fb13ea5
Merge remote-tracking branch 'origin/main' into cagra-search-jit-lto
divyegala 98a1dce
Update cpp/cmake/thirdparty/get_raft.cmake
divyegala a39c150
Update cpp/cmake/thirdparty/get_raft.cmake
divyegala c3a8d73
Merge branch 'main' into cagra-search-jit-lto
KyleFromNVIDIA 8dfb354
Merge branch 'main' into cagra-search-jit-lto
KyleFromNVIDIA 33e1bc5
Add L1 dist op
KyleFromNVIDIA f050b77
Fix L1 distance
KyleFromNVIDIA d6eec0a
Explicitly install cudart
KyleFromNVIDIA 1f3b75b
use function ptr indirection
divyegala 832eaf2
Merge remote-tracking branch 'origin/main' into cagra-search-jit-lto
divyegala 9243390
const
KyleFromNVIDIA dca579a
extern
KyleFromNVIDIA f11daf5
Merge branch 'main' into cagra-search-jit-lto
KyleFromNVIDIA 1c2da37
Re-run CI
KyleFromNVIDIA ff3527b
fix bug and simplify json
divyegala 671e8a7
Merge branch 'cagra-search-jit-lto' of github.com:divyegala/cuvs into…
divyegala e14a119
simply function ptr usage
divyegala 39e67f3
call functions directly
divyegala 59f8911
Merge branch 'main' into cagra-search-jit-lto
KyleFromNVIDIA be21da4
Merge branch 'main' into cagra-search-jit-lto
KyleFromNVIDIA fb5cf1e
Merge branch 'main' into cagra-search-jit-lto
KyleFromNVIDIA 8e6797c
merge upstream, make tests pass
divyegala ca67dea
delete extra files
divyegala 3fb3df0
reconcile jit and non jit paths
divyegala c01ec16
merge 1780
divyegala a6e31c9
remove unneeded wrappers
divyegala bcf0470
specialize jit cache to reduce contention
divyegala a0636f3
rework standard/impl functions and fix recipe
divyegala 04769c7
keep the fragments separate
divyegala 9c981e1
more review
divyegala 5eadc1e
Merge remote-tracking branch 'upstream/main' into cagra-search-jit-lto
divyegala 84a93d4
fix recipe
divyegala f52d8ce
code simplification and ai reviews
divyegala d7fa7f3
use whole compilation for cagra TUs; ai reviews
divyegala 0053171
address reviews
divyegala 44d0ea2
ai review
divyegala 11711c0
attempt to fix smem launch
divyegala 1d58136
attempt to fix smem launch
divyegala dc29e56
dante review
divyegala f329a82
kyle review step 1
divyegala 7f2fa39
fix ci error
divyegala 6598f62
Merge remote-tracking branch 'upstream/main' into cagra-search-jit-lto
divyegala 2f93c6e
kyle review step 2
divyegala File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,93 @@ | ||
| /* | ||
| * SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION. | ||
| * SPDX-License-Identifier: Apache-2.0 | ||
| */ | ||
|
|
||
| #pragma once | ||
|
|
||
| #include <cstdint> | ||
|
|
||
| namespace cuvs::neighbors::cagra::detail { | ||
|
|
||
| struct tag_dist_f {}; | ||
| struct tag_metric_l2 {}; | ||
| struct tag_metric_inner_product {}; | ||
| struct tag_metric_cosine {}; | ||
| struct tag_metric_hamming {}; | ||
| struct tag_codebook_none {}; | ||
| struct tag_codebook_half {}; | ||
| struct tag_metric_l1 {}; | ||
| struct tag_norm_noop {}; | ||
| struct tag_norm_cosine {}; | ||
|
|
||
| /// Multi-kernel planners that do not link `sample_filter` into the JIT link (e.g. | ||
| /// `random_pickup`). Real filters use `cuvs::neighbors::detail::tag_filter_*` on | ||
| /// `CagraPlannerBase`. | ||
| struct tag_cagra_jit_sample_filter_link_absent {}; | ||
|
|
||
| template <typename DataTag, | ||
| typename IndexTag, | ||
| typename DistanceTag, | ||
| typename QueryTag, | ||
| typename CodebookTag, | ||
| uint32_t TeamSize, | ||
| uint32_t DatasetBlockDim, | ||
| uint32_t PqBits, | ||
| uint32_t PqLen> | ||
| struct fragment_tag_setup_workspace {}; | ||
|
|
||
| template <typename DataTag, | ||
| typename IndexTag, | ||
| typename DistanceTag, | ||
| typename QueryTag, | ||
| typename CodebookTag, | ||
| uint32_t TeamSize, | ||
| uint32_t DatasetBlockDim, | ||
| uint32_t PqBits, | ||
| uint32_t PqLen> | ||
| struct fragment_tag_compute_distance {}; | ||
|
|
||
| template <typename QueryTag, typename DistanceTag, typename MetricTag> | ||
| struct fragment_tag_dist_op {}; | ||
|
|
||
| template <typename DataTag, | ||
| typename IndexTag, | ||
| typename DistanceTag, | ||
| typename QueryTag, | ||
| uint32_t TeamSize, | ||
| uint32_t DatasetBlockDim, | ||
| typename NormTag> | ||
| struct fragment_tag_apply_normalization_standard {}; | ||
|
|
||
| template <typename DataTag, | ||
| typename SourceIndexTag, | ||
| typename IndexTag, | ||
| typename DistanceTag, | ||
| bool TopkByBitonicSort, | ||
| bool BitonicSortAndMergeMultiWarps> | ||
| struct fragment_tag_search_single_cta {}; | ||
|
|
||
| template <typename DataTag, | ||
| typename SourceIndexTag, | ||
| typename IndexTag, | ||
| typename DistanceTag, | ||
| bool TopkByBitonicSort, | ||
| bool BitonicSortAndMergeMultiWarps> | ||
| struct fragment_tag_search_single_cta_p {}; | ||
|
|
||
| template <typename DataTag, typename SourceIndexTag, typename IndexTag, typename DistanceTag> | ||
| struct fragment_tag_search_multi_cta {}; | ||
|
|
||
| template <typename DataTag, typename IndexTag, typename DistanceTag> | ||
| struct fragment_tag_random_pickup {}; | ||
|
|
||
| template <typename DataTag, typename IndexTag, typename DistanceTag, typename SourceIndexTag> | ||
| struct fragment_tag_compute_distance_to_child_nodes {}; | ||
|
|
||
| template <typename IndexTag, typename DistanceTag, typename SourceIndexTag> | ||
| struct fragment_tag_apply_filter_kernel {}; | ||
|
|
||
| template <typename BitsetTag, typename SourceIndexTag, typename FilterTag> | ||
| struct fragment_tag_sample_filter {}; | ||
|
|
||
| } // namespace cuvs::neighbors::cagra::detail |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.