Skip to content

Commit

Permalink
Updating docs
Browse files Browse the repository at this point in the history
  • Loading branch information
cjnolet committed Oct 27, 2023
1 parent aec2664 commit b813d19
Show file tree
Hide file tree
Showing 3 changed files with 36 additions and 2 deletions.
3 changes: 2 additions & 1 deletion cpp/bench/ann/src/common/benchmark.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ void bench_build(::benchmark::State& state,
}
}
state.counters.insert(
{{"GPU Time", gpu_timer.total_time() / state.iterations()}, {"index_size", index_size}});
{{"GPU", gpu_timer.total_time() / state.iterations()}, {"index_size", index_size}});

if (state.skipped()) { return; }
make_sure_parent_dir_exists(index.file);
Expand Down Expand Up @@ -367,6 +367,7 @@ void register_build(std::shared_ptr<const Dataset<T>> dataset,
auto* b = ::benchmark::RegisterBenchmark(
index.name + suf, bench_build<T>, dataset, index, force_overwrite);
b->Unit(benchmark::kSecond);
b->MeasureProcessCPUTime();
b->UseRealTime();
}
}
Expand Down
33 changes: 33 additions & 0 deletions docs/source/raft_ann_benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ This project provides a benchmark program for various ANN search implementations
- [End to end: small-scale (<1M to 10M)](#end-to-end-small-scale-benchmarks-1m-to-10m)
- [End to end: large-scale (>10M)](#end-to-end-large-scale-benchmarks-10m-vectors)
- [Running with Docker containers](#running-with-docker-containers)
- [Evaluating the results](#evaluating-the-results)
- [Creating and customizing dataset configurations](#creating-and-customizing-dataset-configurations)
- [Adding a new ANN algorithm](#adding-a-new-ann-algorithm)
- [Parameter tuning guide](https://docs.rapids.ai/api/raft/nightly/ann_benchmarks_param_tuning/)
Expand Down Expand Up @@ -359,6 +360,38 @@ This will drop you into a command line in the container, with the `raft-ann-benc
Additionally, the containers can be run in detached mode without any issue.
### Evaluating the results
The benchmarks capture several different measurements. The table below describes each of the measurements for index build benchmarks:
| Name | Description |
|------------|--------------------------------------------------------|
| Benchmark | A name that uniquely identifies the benchmark instance |
| Time | Wall-time spent training the index |
| CPU | CPU time spent training the index |
| Iterations | Number of iterations (this is usually 1) |
| GPU | GPU time spent building |
| index_size | Number of vectors used to train index |
The table below describes each of the measurements for the index search benchmarks:
| Name | Description |
|------|-------------------------------------------------------------------------------------------------------------------------------------------------------|
| Benchmark | A name that uniquely identifies the benchmark instance |
| Time | The average runtime for each batch. This is approximately `end_to_end` / `Iterations` |
| CPU | The average `wall-time`. In `throughput` mode, this is the average `wall-time` spent in each thread. |
| Iterations | Total number of batches. This is going to be `total_queres` / `n_queries` |
| Recall | Proportion of correct neighbors to ground truth neighbors. Note this column is only present if groundtruth file is specified in dataset configuration |
| items_per_second | Total throughput. This is approximately `total_queries` / `end_to_end`. |
| k | Number of neighbors being queried in each iteration |
| end_to_end | Total time taken to run all batches for all iterations |
| n_queries | Total number of query vectors in each batch |
| total_queries | Total number of vectors queries across all iterations |
Note that the actual table displayed on the screen may differ slightly as the hyper-parameters will also be displayed for each different combination being benchmarked.
## Creating and customizing dataset configurations
A single configuration file will often define a set of algorithms, with associated index and search parameters, for a specific dataset. A configuration file uses json format with 4 major parts:
Expand Down
2 changes: 1 addition & 1 deletion python/raft-ann-bench/src/raft-ann-bench/run/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ def run_build_and_search(
"--build",
"--data_prefix=" + dataset_path,
"--benchmark_out_format=json",
"--benchmark_counters_tabular=true",
"--benchmark_out="
+ f"{os.path.join(build_folder, f'{algo}.json')}",
]
Expand All @@ -121,7 +122,6 @@ def run_build_and_search(
"--search",
"--data_prefix=" + dataset_path,
"--benchmark_counters_tabular",
# "--benchmark_min_time=1x",
"--override_kv=k:%s" % k,
"--override_kv=n_queries:%s" % batch_size,
"--benchmark_min_warmup_time=0.01",
Expand Down

0 comments on commit b813d19

Please sign in to comment.