Skip to content

Commit

Permalink
Merge branch 'branch-23.12' into split_groundtruth_filepath
Browse files Browse the repository at this point in the history
  • Loading branch information
divyegala authored Oct 18, 2023
2 parents 07385df + eb96fc6 commit d6f6417
Show file tree
Hide file tree
Showing 6 changed files with 47 additions and 53 deletions.
16 changes: 8 additions & 8 deletions .github/workflows/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ concurrency:
jobs:
cpp-build:
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/[email protected]
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_type: ${{ inputs.build_type || 'branch' }}
branch: ${{ inputs.branch }}
Expand All @@ -37,7 +37,7 @@ jobs:
python-build:
needs: [cpp-build]
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/[email protected]
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_type: ${{ inputs.build_type || 'branch' }}
branch: ${{ inputs.branch }}
Expand All @@ -46,7 +46,7 @@ jobs:
upload-conda:
needs: [cpp-build, python-build]
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/[email protected]
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_type: ${{ inputs.build_type || 'branch' }}
branch: ${{ inputs.branch }}
Expand All @@ -57,7 +57,7 @@ jobs:
if: github.ref_type == 'branch'
needs: python-build
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/[email protected]
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
arch: "amd64"
branch: ${{ inputs.branch }}
Expand All @@ -69,7 +69,7 @@ jobs:
sha: ${{ inputs.sha }}
wheel-build-pylibraft:
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/[email protected]
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_type: ${{ inputs.build_type || 'branch' }}
branch: ${{ inputs.branch }}
Expand All @@ -79,7 +79,7 @@ jobs:
wheel-publish-pylibraft:
needs: wheel-build-pylibraft
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/[email protected]
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_type: ${{ inputs.build_type || 'branch' }}
branch: ${{ inputs.branch }}
Expand All @@ -89,7 +89,7 @@ jobs:
wheel-build-raft-dask:
needs: wheel-publish-pylibraft
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/[email protected]
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_type: ${{ inputs.build_type || 'branch' }}
branch: ${{ inputs.branch }}
Expand All @@ -99,7 +99,7 @@ jobs:
wheel-publish-raft-dask:
needs: wheel-build-raft-dask
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/[email protected]
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_type: ${{ inputs.build_type || 'branch' }}
branch: ${{ inputs.branch }}
Expand Down
24 changes: 12 additions & 12 deletions .github/workflows/pr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,41 +24,41 @@ jobs:
- wheel-tests-raft-dask
- devcontainer
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/[email protected]
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
checks:
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/[email protected]
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
enable_check_generated_files: false
conda-cpp-build:
needs: checks
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/[email protected]
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_type: pull-request
node_type: cpu16
conda-cpp-tests:
needs: conda-cpp-build
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/[email protected]
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_type: pull-request
conda-python-build:
needs: conda-cpp-build
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/[email protected]
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_type: pull-request
conda-python-tests:
needs: conda-python-build
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/[email protected]
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_type: pull-request
docs-build:
needs: conda-python-build
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/[email protected]
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_type: pull-request
node_type: "gpu-v100-latest-1"
Expand All @@ -68,34 +68,34 @@ jobs:
wheel-build-pylibraft:
needs: checks
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/[email protected]
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_type: pull-request
script: ci/build_wheel_pylibraft.sh
wheel-tests-pylibraft:
needs: wheel-build-pylibraft
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/[email protected]
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_type: pull-request
script: ci/test_wheel_pylibraft.sh
wheel-build-raft-dask:
needs: wheel-tests-pylibraft
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/[email protected]
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_type: pull-request
script: "ci/build_wheel_raft_dask.sh"
wheel-tests-raft-dask:
needs: wheel-build-raft-dask
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/[email protected]
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_type: pull-request
script: ci/test_wheel_raft_dask.sh
devcontainer:
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/[email protected]
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_command: |
sccache -z;
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,23 +16,23 @@ on:
jobs:
conda-cpp-tests:
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/[email protected]
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_type: nightly
branch: ${{ inputs.branch }}
date: ${{ inputs.date }}
sha: ${{ inputs.sha }}
conda-python-tests:
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/[email protected]
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_type: nightly
branch: ${{ inputs.branch }}
date: ${{ inputs.date }}
sha: ${{ inputs.sha }}
wheel-tests-pylibraft:
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/[email protected]
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_type: nightly
branch: ${{ inputs.branch }}
Expand All @@ -41,7 +41,7 @@ jobs:
script: ci/test_wheel_pylibraft.sh
wheel-tests-raft-dask:
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/[email protected]
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_type: nightly
branch: ${{ inputs.branch }}
Expand Down
2 changes: 1 addition & 1 deletion ci/release/update-version.sh
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ for FILE in .github/workflows/*.yaml; do
done

for FILE in .github/workflows/*.yaml; do
sed_runner "/shared-action-workflows/ s/@.*/@branch-${NEXT_SHORT_TAG}/g" "${FILE}"
sed_runner "/shared-workflows/ s/@.*/@branch-${NEXT_SHORT_TAG}/g" "${FILE}"
done
sed_runner "s/RAPIDS_VERSION_NUMBER=\".*/RAPIDS_VERSION_NUMBER=\"${NEXT_SHORT_TAG}\"/g" ci/build_docs.sh

Expand Down
48 changes: 22 additions & 26 deletions docs/source/ann_benchmarks_low_level.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,57 +2,53 @@
#### End-to-end Example
An end-to-end example (run from the RAFT source code root directory):
```bash
# (1) prepare a dataset
pushd
# (0) get raft sources
git clone https://github.com/rapidsai/raft.git
cd raft

cd cpp/bench/ann
mkdir data && cd data
wget http://ann-benchmarks.com/glove-100-angular.hdf5
# (1) prepare a dataset
export PYTHONPATH=python/raft-ann-bench/src:$PYTHONPATH
python -m raft-ann-bench.get_dataset --dataset glove-100-angular --normalize

# option -n is used here to normalize vectors so cosine distance is converted
# option --normalize is used here to normalize vectors so cosine distance is converted
# to inner product; don't use -n for l2 distance
python scripts/hdf5_to_fbin.py -n glove-100-angular.hdf5

mkdir glove-100-inner
mv glove-100-angular.base.fbin glove-100-inner/base.fbin
mv glove-100-angular.query.fbin glove-100-inner/query.fbin
mv glove-100-angular.groundtruth.neighbors.ibin glove-100-inner/groundtruth.neighbors.ibin
mv glove-100-angular.groundtruth.distances.fbin glove-100-inner/groundtruth.distances.fbin
popd

# (2) build index
./cpp/build/RAFT_IVF_FLAT_ANN_BENCH \
--data_prefix=cpp/bench/ann/data \
$CONDA_PREFIX/bin/ann/RAFT_IVF_FLAT_ANN_BENCH \
--data_prefix=datasets \
--build \
--benchmark_filter="raft_ivf_flat\..*" \
cpp/bench/ann/conf/glove-100-inner.json
python/raft-ann-bench/src/raft-ann-bench/run/conf/glove-100-inner.json

# (3) search
./cpp/build/RAFT_IVF_FLAT_ANN_BENCH \
--data_prefix=cpp/bench/ann/data \
$CONDA_PREFIX/bin/ann/RAFT_IVF_FLAT_ANN_BENCH\
--data_prefix=datasets \
--benchmark_min_time=2s \
--benchmark_out=ivf_flat_search.csv \
--benchmark_out_format=csv \
--benchmark_counters_tabular \
--search \
--benchmark_filter="raft_ivf_flat\..*"
cpp/bench/ann/conf/glove-100-inner.json
--benchmark_filter="raft_ivf_flat\..*" \
python/raft-ann-bench/src/raft-ann-bench/run/conf/glove-100-inner.json


# optional step: plot QPS-Recall figure using data in ivf_flat_search.csv with your favorite tool
```

##### Step 1: Prepare Dataset
Note: the preferred way to download and process smaller (million scale) datasets is to use the `get_dataset` script as demonstrated in the example above.

A dataset usually has 4 binary files containing database vectors, query vectors, ground truth neighbors and their corresponding distances. For example, Glove-100 dataset has files `base.fbin` (database vectors), `query.fbin` (query vectors), `groundtruth.neighbors.ibin` (ground truth neighbors), and `groundtruth.distances.fbin` (ground truth distances). The first two files are for index building and searching, while the other two are associated with a particular distance and are used for evaluation.

The file suffixes `.fbin`, `.f16bin`, `.ibin`, `.u8bin`, and `.i8bin` denote that the data type of vectors stored in the file are `float32`, `float16`(a.k.a `half`), `int`, `uint8`, and `int8`, respectively.
These binary files are little-endian and the format is: the first 8 bytes are `num_vectors` (`uint32_t`) and `num_dimensions` (`uint32_t`), and the following `num_vectors * num_dimensions * sizeof(type)` bytes are vectors stored in row-major order.

Some implementation can take `float16` database and query vectors as inputs and will have better performance. Use `script/fbin_to_f16bin.py` to transform dataset from `float32` to `float16` type.
Some implementation can take `float16` database and query vectors as inputs and will have better performance. Use `python/raft-ann-bench/src/raft-ann-bench/get_dataset/fbin_to_f16bin.py` to transform dataset from `float32` to `float16` type.

Commonly used datasets can be downloaded from two websites:
1. Million-scale datasets can be found at the [Data sets](https://github.com/erikbern/ann-benchmarks#data-sets) section of [`ann-benchmarks`](https://github.com/erikbern/ann-benchmarks).

However, these datasets are in HDF5 format. Use `cpp/bench/ann/scripts/hdf5_to_fbin.py` to transform the format. A few Python packages are required to run it:
However, these datasets are in HDF5 format. Use `python/raft-ann-bench/src/raft-ann-bench/get_dataset/fbin_to_f16bin.py/hdf5_to_fbin.py` to transform the format. A few Python packages are required to run it:
```bash
pip3 install numpy h5py
```
Expand All @@ -72,8 +68,8 @@ Commonly used datasets can be downloaded from two websites:
2. Billion-scale datasets can be found at [`big-ann-benchmarks`](http://big-ann-benchmarks.com). The ground truth file contains both neighbors and distances, thus should be split. A script is provided for this:
```bash
$ cpp/bench/ann/scripts/split_groundtruth.pl
usage: script/split_groundtruth.pl input output_prefix
$ python/raft-ann-bench/src/raft-ann-bench/split_groundtruth/split_groundtruth.pl
usage: split_groundtruth.pl input output_prefix
```
Take Deep-1B dataset as an example:
```bash
Expand All @@ -82,7 +78,7 @@ Commonly used datasets can be downloaded from two websites:
mkdir -p data/deep-1B && cd data/deep-1B
# download manually "Ground Truth" file of "Yandex DEEP"
# suppose the file name is deep_new_groundtruth.public.10K.bin
../../scripts/split_groundtruth.pl deep_new_groundtruth.public.10K.bin groundtruth
/path/to/raft/python/raft-ann-bench/src/raft-ann-bench/split_groundtruth/split_groundtruth.pl deep_new_groundtruth.public.10K.bin groundtruth
# two files 'groundtruth.neighbors.ibin' and 'groundtruth.distances.fbin' should be produced
popd
```
Expand Down
2 changes: 0 additions & 2 deletions python/raft-ann-bench/src/raft-ann-bench/run/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -243,8 +243,6 @@ def main():
dataset_path = args.dataset_path
if not os.path.exists(conf_filepath):
raise FileNotFoundError(conf_filename)
if not os.path.exists(os.path.join(args.dataset_path, dataset_name)):
raise FileNotFoundError(os.path.join(args.dataset_path, dataset_name))

with open(conf_filepath, "r") as f:
conf_file = json.load(f)
Expand Down

0 comments on commit d6f6417

Please sign in to comment.