-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diskann Benchmarking Wrapper #260
base: branch-25.02
Are you sure you want to change the base?
Conversation
…diskann-wrapper
…diskann-wrapper
…diskann-wrapper
…into diskann-wrapper
…into diskann-wrapper
…diskann-wrapper
…diskann-wrapper
…diskann-wrapper
/ok to test |
@@ -32,6 +32,12 @@ option(CUVS_ANN_BENCH_USE_CUVS_BRUTE_FORCE "Include cuVS brute force knn in benc | |||
option(CUVS_ANN_BENCH_USE_CUVS_CAGRA_HNSWLIB "Include cuVS CAGRA with HNSW search in benchmark" ON) | |||
option(CUVS_ANN_BENCH_USE_HNSWLIB "Include hnsw algorithm in benchmark" ON) | |||
option(CUVS_ANN_BENCH_USE_GGNN "Include ggnn algorithm in benchmark" OFF) | |||
option(CUVS_ANN_BENCH_USE_DISKANN "Include DISKANN search in benchmark" ON) | |||
option(CUVS_ANN_BENCH_USE_CUVS_VAMANA "Include cuVS Vamana with DiskANN search in benchmark" ON) | |||
if(CMAKE_SYSTEM_PROCESSOR MATCHES "(ARM|arm|aarch64)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does MSFT DiskANN repo not support ARM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they have mkl-devel
as a dependency, which is not meant to be installed in aarch64.
template <typename T> | ||
void diskann_ssd<T>::build(const T* dataset, size_t nrow) | ||
{ | ||
diskann::build_disk_index<float>(base_file_.c_str(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this defined? Is it using the CPU DiskANN repo to build the ssd verson of the index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is right. It is just to benchmark the CPU build
/ok to test |
/ok to test |
…into diskann-wrapper
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid algorithm-specific changes to cpp/bench/ann/src/common/benchmark.hpp
as per our offline discussion.
I'm not opposed to exposing dataset/index filepaths to algorithms in general though.
But maybe it would make sense to hide (filter out) them when dumping all build parameters to avoid cluttering the console/reports and exposing private user filepaths?
@achirkin are you suggesting that we expose the dataset / index filepaths to all algorithms, but only use them for diskann ssd indexes? |
@@ -144,7 +150,8 @@ void bench_build(::benchmark::State& state, | |||
|
|||
const auto algo_property = parse_algo_property(algo->get_preference(), index.build_param); | |||
|
|||
const T* base_set = dataset->base_set(algo_property.dataset_memory_type); | |||
const T* base_set = nullptr; | |||
if (index.algo != "diskann_ssd") base_set = dataset->base_set(algo_property.dataset_memory_type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@achirkin if we do not have this line, the entire dataset will be read into the base_set
variable needlessly for DiskANN ssd based indexes. For these indexes, only the path to the dataset needs to be known and the DiskANN index building APIs will read the data from the path. In fact, it is important we do it this way because those SSD indexes are designed to let the data remain on disk if there are extremely large datasets.
Brings DiskANN into cuvs-bench