From 7392f8855c9193c294860e36d5f20a67de7ef966 Mon Sep 17 00:00:00 2001 From: applecuckoo Date: Fri, 19 Jan 2024 16:01:28 +1300 Subject: [PATCH 1/2] fix dead link to Caltech256 dataset --- scripts/4_optional_download_neutral_.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/4_optional_download_neutral_.sh b/scripts/4_optional_download_neutral_.sh index b57a25e..cb3c3c4 100755 --- a/scripts/4_optional_download_neutral_.sh +++ b/scripts/4_optional_download_neutral_.sh @@ -5,5 +5,5 @@ base_dir="$(dirname "$scripts_dir")" raw_data_dir="$base_dir/raw_data" mkdir -p "$raw_data_dir/neutral" -wget http://www.vision.caltech.edu/Image_Datasets/Caltech256/256_ObjectCategories.tar -P "$raw_data_dir/neutral" +wget https://data.caltech.edu/records/nyy15-4j048/files/256_ObjectCategories.tar -P "$raw_data_dir/neutral" tar -xf "$raw_data_dir/neutral/256_ObjectCategories.tar" -C "$raw_data_dir/neutral" From e41aa3a71dcec297fb6697bd6673dd84da2d0fd4 Mon Sep 17 00:00:00 2001 From: applecuckoo Date: Mon, 22 Jan 2024 12:35:48 +1300 Subject: [PATCH 2/2] fix Caltech256 link in README --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 9359f0a..f527e68 100644 --- a/README.md +++ b/README.md @@ -16,7 +16,7 @@ Here is what each script (located under `scripts` directory) does: *Note*: I already ran this script for you, and its outputs are located in `raw_data` directory. No need to rerun unless you edit files under `scripts/source_urls`. - `2_download_from_urls_.sh` - downloads actual images for urls found in text files in `raw_data` directory. - `3_optional_download_drawings_.sh` - (optional) script that downloads SFW anime images from the [Danbooru2018](https://www.gwern.net/Danbooru2018) database. -- `4_optional_download_neutral_.sh` - (optional) script that downloads SFW neutral images from the [Caltech256](http://www.vision.caltech.edu/Image_Datasets/Caltech256/) dataset +- `4_optional_download_neutral_.sh` - (optional) script that downloads SFW neutral images from the [Caltech256](https://data.caltech.edu/records/nyy15-4j048) dataset - `5_create_train_.sh` - creates `data/train` directory and copy all `*.jpg` and `*.jpeg` files into it from `raw_data`. Also removes corrupted images. - `6_create_test_.sh` - creates `data/test` directory and moves `N=2000` random files for each class from `data/train` to `data/test` (change this number inside the script if you need a different train/test split). Alternatively, you can run it multiple times, each time it will move `N` images for each class from `data/train` to `data/test`.