From 226745ccefa6076f5be7349f73d2adf80f2266b8 Mon Sep 17 00:00:00 2001
From: Tamas Bela Feher <tfeher@nvidia.com>
Date: Thu, 23 Nov 2023 16:34:14 +0100
Subject: [PATCH] Edit benchmark guide

---
 docs/source/ann_benchmarks_dataset.md | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/docs/source/ann_benchmarks_dataset.md b/docs/source/ann_benchmarks_dataset.md
index 821345b07c..fd950843fe 100644
--- a/docs/source/ann_benchmarks_dataset.md
+++ b/docs/source/ann_benchmarks_dataset.md
@@ -46,6 +46,14 @@ Commonly used datasets can be downloaded from two websites:
     ```
     Besides ground truth files for the whole billion-scale datasets, this site also provides ground truth files for the first 10M or 100M vectors of the base sets. This mean we can use these billion-scale datasets as million-scale datasets. To facilitate this, an optional parameter `subset_size` for dataset can be used. See the next step for further explanation.
 
+3. Synthetic dataset
+To generate a synthetic dataset with random data you can use the following command
+```bash
+python -m raft-ann-bench.generate_dataset --rows 1000000 --cols 128 --dtype float32 dataset/base.fbin
+```
+Here `rows` stands determines the number of dataset vectors, and `cols` refers to the number of features each vector has.
+By default random blobs are generated using [make_blobs](https://docs.rapids.ai/api/cuml/latest/api/#cuml.datasets.make_blobs), alternatively uniform random can be also used. Keep in mind that large number of dimensions and uniform random numbers will lead to a dataset that hard to search accurately using ANN methods.
+
 ## Generate ground truth
 
 If you have a dataset, but no corresponding ground truth file, then you can generate ground trunth using the `generate_groundtruth` utility. Example usage: