Multi segment cagra search#133
Draft
jamxia155 wants to merge 5 commits intorapidsai:mainfrom
Draft
Conversation
Overrides rewrite() to run all segment searches into a shared device buffer and merge with cuvsSelectK entirely on GPU, eliminating per-segment D2H copies and CPU-side TopDocs.merge(). Falls back to the standard Lucene per-segment path when any segment lacks a CAGRA index, an explicit filter is set, or k > 1024. Also adds ordToDoc() and getCagraIndexForField() helpers to CuVS2510GPUVectorsReader to support result decoding. Fixes for Lucene 10.2 API changes: CodecReader moved to org.apache.lucene.index; createRewrittenQuery() removed and replaced with an inline docAndScoreQuery() implementation using the public Weight/ScorerSupplier API.
- CuVS2510GPUVectorsFormat: call CuVSProvider.provider().enableRMMAsyncMemory() in the static initializer so that cuda_async_memory_resource is active for the lifetime of the codec. This makes CAGRA workspace deallocations stream-ordered and non-blocking, which is required for the CudaStreamPool to provide any parallelism benefit. - GPUKnnFloatVectorQuery: upload the query vector to device once before the per-segment loop and share the resulting CuVSMatrix across all CagraQuery instances, reducing host-to-device copies from O(numSegments) to 1 per query. Wrap the shared device matrix in try-with-resources to close the RMM allocation promptly after MultiSegmentCagraSearch.search() returns. - FilterCuVSProvider: delegate enableRMMAsyncMemory() to the wrapped provider.
GPUKnnFloatVectorQuery / GPUPerLeafCuVSKnnCollector: - Add persistent, persistentLifetime, and persistentDeviceUsage parameters, threaded through all constructor overloads and forwarded to CagraSearchParams.Builder in both the multi-segment rewrite() path and the per-segment approximateSearch() fallback path. - Add threadBlockSize parameter (0 = auto) to allow tuning of the persistent kernel's worker_queue_size, which determines how many concurrent query threads can run without latency increase. Fix persistent-runner hash instability across segments (rewrite() path): - When max_iterations is 0 (auto), CAGRA computes it from each segment's dataset size. Different-sized segments produce different values, causing a distinct runner hash per segment and a destroy/recreate cycle on every search call. - Add computeMaxIterations(), which mirrors adjust_search_params() from search_plan.cuh, and call it once using the largest segment's graph size and degree. All segments then share the same max_iterations, producing a stable runner hash across the full multi-segment query. CuVS2510GPUVectorsReader: - Forward threadBlockSize, persistent, persistentLifetime, and persistentDeviceUsage from GPUPerLeafCuVSKnnCollector to CagraSearchParams.Builder in the per-segment fallback path.
Remove persistent kernel mode: - Drop persistent, persistentLifetime, and persistentDeviceUsage fields and parameters from GPUKnnFloatVectorQuery, GPUPerLeafCuVSKnnCollector, and CuVS2510GPUVectorsReader. The persistent kernel is superseded by the native multi-segment kernel (cuvsCagraSearchMultiSegment) which achieves better concurrency without the per-runner lifecycle overhead. - Collapse the 11-argument GPUKnnFloatVectorQuery constructor (which only existed to accept persistent parameters) into the standard 8-argument form. - Remove stale comments that described max_iterations uniformity in terms of persistent-runner hash stability; replace with accurate explanation (consistent search quality across segments of different sizes). Add workspace pool configuration: - Add WORKSPACE_POOL_SIZE_PROPERTY constant to ThreadLocalCuVSResourcesProvider. - On resources creation, read com.nvidia.cuvs.workspacePoolSize system property and call setWorkspacePool() if set, so callers can pre-warm the per-thread RMM pool without modifying cuvs-lucene source.
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Companion PR to cuvs!2035, addresses #124.
Existing CAGRA search code path for each search query:
Proposed change: