Update README.md by aamijar · Pull Request #136 · rapidsai/cuvs-lucene

aamijar · 2026-04-25T04:21:12Z

This PR improves the README.md in the following ways:

Add Installing cuvs-lucene section
Add What is cuvs-lucene section
Add Table of Contents section
Add Getting started section
Add References section

cjnolet · 2026-04-28T18:09:06Z

-## Building
+## What is cuvs-lucene?
+
+`cuvs-lucene` provides a pluggable [KnnVectorsFormat](https://lucene.apache.org/core/10_3_1/core/org/apache/lucene/codecs/KnnVectorsFormat.html) that uses cuVS to offload vector index build — and optionally search — to NVIDIA GPUs. Because it plugs in through a standard Lucene codec, existing Lucene applications can take advantage of GPU acceleration with minimal code changes and gracefully fall back to the default CPU codec when no GPU is present.


cc @narangvivek10 for review on correctness.

The link uses docs for 10.3.1. We can instead use https://lucene.apache.org/core/10_2_0/core/org/apache/lucene/codecs/KnnVectorsFormat.html as we currently use 10.2.0

cjnolet · 2026-04-28T18:09:58Z

+- `Lucene101AcceleratedHNSWCodec` — GPU-accelerated HNSW build with CPU HNSW search. The on-disk format is standard Lucene HNSW, so indexes built on the GPU can be read by any stock Lucene 10.x reader.
+  - `LuceneAcceleratedHNSWScalarQuantizedCodec` — scalar-quantized vectors for a smaller index footprint.
+  - `LuceneAcceleratedHNSWBinaryQuantizedCodec` — binary-quantized vectors for an even smaller index footprint.
+- `CuVS2510GPUSearchCodec` — GPU-accelerated HNSW build and GPU search


I don't think we have a codec for CPU build HNSW search. We have codecs for GPU build CPU search and for GPU build GPU search. Can you verify? @narangvivek10

CPU build HNSW search

I guess here you mean indexing on the CPU and searching on the CPU for HNSW. No, we do not because Lucene itself has codecs and formats available for that. We, however, have a fallback mechanism in formats like, for example, in Lucene99AcceleratedHNSWVectorsFormat that internally refers to Lucene's index and search on CPU logic. This is helpful for people who use our Codec/format, and they do not have a GPU/cuVS available.

cjnolet · 2026-04-28T18:12:14Z

+- [CUDA 12.0+](https://developer.nvidia.com/cuda-toolkit-archive)
 - [JDK 22](https://jdk.java.net/archive/)
+- [Maven 3.9.6+](https://maven.apache.org/download.cgi)
+- The native `libcuvs_c.so` on the runtime library path. Please see the cuVS [Build and Install Guide](https://docs.rapids.ai/api/cuvs/nightly/build/) for install options (conda, pip, tarball, or build from source).


I think a better way to word this for the uninformed user is that they just that they should have a compatible version of cuvs installed. I think this also warrants adding a page to the cuvs docs on what a compatible version means (spoiler alert- we change the ABI twice a year- 2x.02 and 2x.08, so 2, 4, 6 are compatible with each other and 8, 10, 12 are compatible with each other). You should just be able to say "cuVS 24.04 - 24.06" for the compatible version. We'll bump this every version.

cjnolet · 2026-04-28T18:13:30Z

-The artifacts would be built and available in the target / folder.

-### Running Tests
+The resulting artifacts are written to `target/`. To run the tests, point `LD_LIBRARY_PATH` at a local `libcuvs_c.so`:


I think we should try to avoid using libcuvs_c.so explicitly where possible and tell fols to install the cuvs tarball. (You can point to the tarball install instructions. Same as above in the prereqs).

cjnolet · 2026-04-28T18:14:06Z


+## Getting Started
+
+The snippet below plugs the GPU-accelerated HNSW codec into a standard Lucene `IndexWriter`. Once the codec is set on the `IndexWriterConfig`, indexing proceeds exactly as it would with the default Lucene codec, and search uses the stock `KnnFloatVectorQuery`:


Did you verify this locally by copying into a file and building?

cjnolet · 2026-04-28T18:15:08Z

+}
+```
+
+For fully runnable versions of this example, including one that indexes and searches entirely on the GPU using `CuVS2510GPUSearchCodec`, please refer to the [`examples/`](examples) directory.


Can you verify this link renders properly? Maybe github has updated it, but I've never had luck trying to get highlighted code snippets to render as links.

Yep, it renders properly. You can can check the rendered Readme on my branch:
https://github.com/aamijar/cuvs-lucene/tree/update-readme

narangvivek10 · 2026-05-01T18:37:47Z

+try (Directory dir = FSDirectory.open(indexPath);
+    IndexWriter writer = new IndexWriter(dir, config)) {
+  Document doc = new Document();
+  doc.add(new KnnFloatVectorField("vector_field", embedding, EUCLIDEAN));


I notice embedding here, which I guess isn't pointing to anything. Am I missing something? Ideally, we should have code snippets that folks would just want to copy and see in action without the need to make changes for it to work.

update-readme

1fcd24e

aamijar requested a review from a team as a code owner April 25, 2026 04:21

aamijar added 3 commits April 25, 2026 04:33

update references

f793f8b

update references

8d81c45

em dash

6ca8e66

cjnolet reviewed Apr 28, 2026

View reviewed changes

narangvivek10 reviewed May 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update README.md#136

Update README.md#136
aamijar wants to merge 4 commits intorapidsai:mainfrom
aamijar:update-readme

aamijar commented Apr 25, 2026

Uh oh!

cjnolet Apr 28, 2026

Uh oh!

narangvivek10 May 1, 2026 •

edited

Loading

Uh oh!

cjnolet Apr 28, 2026

Uh oh!

narangvivek10 May 1, 2026

Uh oh!

cjnolet Apr 28, 2026

Uh oh!

cjnolet Apr 28, 2026

Uh oh!

cjnolet Apr 28, 2026

Uh oh!

cjnolet Apr 28, 2026

Uh oh!

aamijar Apr 28, 2026

Uh oh!

narangvivek10 May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		## Getting Started

		The snippet below plugs the GPU-accelerated HNSW codec into a standard Lucene `IndexWriter`. Once the codec is set on the `IndexWriterConfig`, indexing proceeds exactly as it would with the default Lucene codec, and search uses the stock `KnnFloatVectorQuery`:

Conversation

aamijar commented Apr 25, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

narangvivek10 May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

narangvivek10 May 1, 2026 •

edited

Loading