Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libraft and pylibraft API for CAGRA build and HNSW search #2022

Merged
merged 42 commits into from
Jan 26, 2024

Conversation

divyegala
Copy link
Member

@divyegala divyegala commented Nov 23, 2023

Closes #1772

@divyegala divyegala added feature request New feature or request non-breaking Non-breaking change labels Nov 23, 2023
@divyegala divyegala self-assigned this Nov 23, 2023
@divyegala divyegala marked this pull request as ready for review November 28, 2023 04:28
@divyegala divyegala requested review from a team as code owners November 28, 2023 04:28
cpp/CMakeLists.txt Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/cagra_hnswlib.hpp Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/cagra_hnswlib_types.hpp Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/detail/cagra_hnswlib.hpp Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/detail/cagra_hnswlib.hpp Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/cagra_hnswlib_types.hpp Outdated Show resolved Hide resolved
python/pylibraft/pylibraft/neighbors/cagra_hnswlib.pyx Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/detail/cagra_hnswlib.hpp Outdated Show resolved Hide resolved
@divyegala divyegala requested a review from lowener December 14, 2023 13:10
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the general idea is getting there, but there's still a lot of api leakage and we have an opportunity here for a dedicated HNSW API (independent from hnswlib).

cpp/cmake/thirdparty/get_hnswlib.cmake Outdated Show resolved Hide resolved
cpp/cmake/thirdparty/get_hnswlib.cmake Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/cagra_hnswlib_types.hpp Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/detail/cagra_hnswlib.hpp Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/hnswlib_types.hpp Outdated Show resolved Hide resolved
cpp/include/raft_runtime/neighbors/cagra.hpp Show resolved Hide resolved
cpp/include/raft_runtime/neighbors/cagra_hnswlib.hpp Outdated Show resolved Hide resolved
cpp/src/raft_runtime/neighbors/cagra_hnswlib.cpp Outdated Show resolved Hide resolved
cpp/include/raft_runtime/neighbors/cagra_hnswlib.hpp Outdated Show resolved Hide resolved
@divyegala
Copy link
Member Author

divyegala commented Dec 15, 2023

@cjnolet I addressed your suggestions and reworked the filenames/namespaces/structure to a pure hnsw namespace. There's one thing that I'd like your opinion on and something I would consider a nice-to-have. As you requested, I moved the serialization and de-serialization functions into hnsw_serialize.cuh, however, here's the problem:

  1. serialize accepts a CAGRA index and needs to be CUDA compatible
  2. deserialize returns an hnsw index and does not need to be CUDA compatible

As a result of putting these two functions in the same header, they need to be compiled in the same .cu translation unit because hnswlib is not true header-only: it does not have inline functions hence all headers that contain an hnswlib included header must be compiled in the same TU.

By splitting deserialize and serialize into two different headers and thus, two TUs, we will be able to:

  1. Compile serialize in hnswlib_serialize.cu
  2. Compile deserialize and search in hnsw.cpp

Another benefit of this split is that users will be able to build CPU-only RAFT and still be able to load hnsw indexes and search them. If this change is acceptable to you, then I will go ahead and make it.

@divyegala divyegala changed the title pylibraft API for CAGRA build and HNSW search libraft and pylibraft API for CAGRA build and HNSW search Dec 19, 2023
@divyegala
Copy link
Member Author

@lowener I filed an issue for filtering functions as a follow-on, this PR is pretty big already

raft::host_matrix_view<const T, int64_t, row_major> dataset) \
->raft::neighbors::cagra::index<T, IdxT>; \
\
void build_device(raft::resources const& handle, \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just overload?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, this change predates this PR

Copy link

copy-pr-bot bot commented Jan 22, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@vyasr vyasr force-pushed the cagra_hnswlib_pylibraft branch from c6b896f to ae543e7 Compare January 23, 2024 00:13
@cjnolet
Copy link
Member

cjnolet commented Jan 26, 2024

/merge

@rapids-bot rapids-bot bot merged commit 2822532 into rapidsai:branch-24.02 Jan 26, 2024
61 checks passed
raydouglass pushed a commit that referenced this pull request Jan 29, 2024
This PR conditionally includes `hnsw` sources, to prevent build errors like those seen in cuGraph after #2022 was merged. See also: rapidsai/cugraph#4121, rapidsai/cugraph#4122

Authors:
   - Divye Gala (https://github.com/divyegala)

Approvers:
   - Corey J. Nolet (https://github.com/cjnolet)
   - Bradley Dice (https://github.com/bdice)
   - Robert Maynard (https://github.com/robertmaynard)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp feature request New feature or request non-breaking Non-breaking change python
Projects
Development

Successfully merging this pull request may close these issues.

[FEA] Explore potential for a CAGRA trained model on GPU to be inferenced on CPU
4 participants