Skip to content

Update README.md#136

Open
aamijar wants to merge 4 commits intorapidsai:mainfrom
aamijar:update-readme
Open

Update README.md#136
aamijar wants to merge 4 commits intorapidsai:mainfrom
aamijar:update-readme

Conversation

@aamijar
Copy link
Copy Markdown
Member

@aamijar aamijar commented Apr 25, 2026

This PR improves the README.md in the following ways:

  1. Add Installing cuvs-lucene section
  2. Add What is cuvs-lucene section
  3. Add Table of Contents section
  4. Add Getting started section
  5. Add References section

@aamijar aamijar requested a review from a team as a code owner April 25, 2026 04:21
Comment thread README.md
## Building
## What is cuvs-lucene?

`cuvs-lucene` provides a pluggable [KnnVectorsFormat](https://lucene.apache.org/core/10_3_1/core/org/apache/lucene/codecs/KnnVectorsFormat.html) that uses cuVS to offload vector index build — and optionally search — to NVIDIA GPUs. Because it plugs in through a standard Lucene codec, existing Lucene applications can take advantage of GPU acceleration with minimal code changes and gracefully fall back to the default CPU codec when no GPU is present.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @narangvivek10 for review on correctness.

Copy link
Copy Markdown
Collaborator

@narangvivek10 narangvivek10 May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link uses docs for 10.3.1. We can instead use https://lucene.apache.org/core/10_2_0/core/org/apache/lucene/codecs/KnnVectorsFormat.html as we currently use 10.2.0

Comment thread README.md
- `Lucene101AcceleratedHNSWCodec` — GPU-accelerated HNSW build with CPU HNSW search. The on-disk format is standard Lucene HNSW, so indexes built on the GPU can be read by any stock Lucene 10.x reader.
- `LuceneAcceleratedHNSWScalarQuantizedCodec` — scalar-quantized vectors for a smaller index footprint.
- `LuceneAcceleratedHNSWBinaryQuantizedCodec` — binary-quantized vectors for an even smaller index footprint.
- `CuVS2510GPUSearchCodec` — GPU-accelerated HNSW build and GPU search
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have a codec for CPU build HNSW search. We have codecs for GPU build CPU search and for GPU build GPU search. Can you verify? @narangvivek10

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPU build HNSW search

I guess here you mean indexing on the CPU and searching on the CPU for HNSW. No, we do not because Lucene itself has codecs and formats available for that. We, however, have a fallback mechanism in formats like, for example, in Lucene99AcceleratedHNSWVectorsFormat that internally refers to Lucene's index and search on CPU logic. This is helpful for people who use our Codec/format, and they do not have a GPU/cuVS available.

Comment thread README.md
- [CUDA 12.0+](https://developer.nvidia.com/cuda-toolkit-archive)
- [JDK 22](https://jdk.java.net/archive/)
- [Maven 3.9.6+](https://maven.apache.org/download.cgi)
- The native `libcuvs_c.so` on the runtime library path. Please see the cuVS [Build and Install Guide](https://docs.rapids.ai/api/cuvs/nightly/build/) for install options (conda, pip, tarball, or build from source).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better way to word this for the uninformed user is that they just that they should have a compatible version of cuvs installed. I think this also warrants adding a page to the cuvs docs on what a compatible version means (spoiler alert- we change the ABI twice a year- 2x.02 and 2x.08, so 2, 4, 6 are compatible with each other and 8, 10, 12 are compatible with each other). You should just be able to say "cuVS 24.04 - 24.06" for the compatible version. We'll bump this every version.

Comment thread README.md
The artifacts would be built and available in the target / folder.

### Running Tests
The resulting artifacts are written to `target/`. To run the tests, point `LD_LIBRARY_PATH` at a local `libcuvs_c.so`:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should try to avoid using libcuvs_c.so explicitly where possible and tell fols to install the cuvs tarball. (You can point to the tarball install instructions. Same as above in the prereqs).

Comment thread README.md

## Getting Started

The snippet below plugs the GPU-accelerated HNSW codec into a standard Lucene `IndexWriter`. Once the codec is set on the `IndexWriterConfig`, indexing proceeds exactly as it would with the default Lucene codec, and search uses the stock `KnnFloatVectorQuery`:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you verify this locally by copying into a file and building?

Comment thread README.md
}
```

For fully runnable versions of this example, including one that indexes and searches entirely on the GPU using `CuVS2510GPUSearchCodec`, please refer to the [`examples/`](examples) directory.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you verify this link renders properly? Maybe github has updated it, but I've never had luck trying to get highlighted code snippets to render as links.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it renders properly. You can can check the rendered Readme on my branch:
https://github.com/aamijar/cuvs-lucene/tree/update-readme

Comment thread README.md
try (Directory dir = FSDirectory.open(indexPath);
IndexWriter writer = new IndexWriter(dir, config)) {
Document doc = new Document();
doc.add(new KnnFloatVectorField("vector_field", embedding, EUCLIDEAN));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice embedding here, which I guess isn't pointing to anything. Am I missing something? Ideally, we should have code snippets that folks would just want to copy and see in action without the need to make changes for it to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants