Merge pull request #17 from baldassarreFe/feature/typos

Feature/typos
baldassarreFe · May 16, 2020 · af99f75 · af99f75
2 parents 1c2df17 + 4a2a92d
commit af99f75
Show file tree

Hide file tree

Showing 13 changed files with 112 additions and 55 deletions.
diff --git a/.gitignore b/.gitignore
@@ -101,4 +101,4 @@ resized
 
 py36
 py37
-imagenet
+imagenet
diff --git a/DATASET.md b/DATASET.md
@@ -10,6 +10,17 @@ Storing the images and the embedding as separate files on disk (`jpeg/png` and `
 the performances during training, so all the image-embedding pairs are stored in binary format in large 
 continuous [TFRecords](https://www.tensorflow.org/programmers_guide/datasets).
 
+### Table of contents
+
+- [Pipeline](#pipeline)
+    - [Imagenet images](#imagenet-images)
+    - [Resizing for training](#resizing-for-training)
+    - [Converting to TFRecords](#converting-to-tfrecords)
+    - [Validation set](#validation-set)
+- [Space on disk notes](#space-on-disk-notes)
+    - [The images](#the-images)
+    - [The TFRecords](#the-tfrecords)
+
 ## Pipeline
 
 All the data preparation steps are independent and persisted on the disk, the default (and recommended) folder structure is:

diff --git a/INSTRUCTIONS.md b/INSTRUCTIONS.md
@@ -1,5 +1,11 @@
 # Instructions
 
+### Table of contents
+- [Environment](#environment)
+- [Dataset](#dataset)
+- [Training and evaluation](#training-and-evaluation)
+- [Development and test the code](#development-and-test-the-code)
+
 ## Environment
 The project is based on Python 3.6, to manage the dependencies contained in 
 [`requirements.txt`](requirements.txt) a virtual environment is recommended.
@@ -20,9 +26,12 @@ pip install -e .
 For GPU-support, run:
 
 ```
-$ pip install -e .[gpu]
+pip install -e .[gpu]
 ```
 
+> If it does not work, manually install `tensorflow-gpu` library (must be compatible with your OS, python version and
+> `tensorflow` library). More info [here](https://www.tensorflow.org/install/gpu).
+
 ## Dataset
 Prior to training, the images from ImageNet need to be downloaded, resized and processed.
 
@@ -64,3 +73,7 @@ python -m koalarization.evaluate \
   --run-id 'run1' \
   'data/tfrecords' 'runs/'
 ```
+
+## Development and test the code
+
+See [here](tests/README.md).
diff --git a/README.md b/README.md
@@ -9,24 +9,12 @@
 </p>
 
 <p align="center">
-  <a href="https://www.python.org/downloads/release/python-360/">
-    <img alt="Python 3.6" src="https://img.shields.io/badge/python-3.6-blue.svg">
-  </a>
-  <a href="https://github.com/baldassarreFe/deep-koalarization/blob/master/LICENSE">
-    <img alt="GitHub License" src="https://img.shields.io/github/license/baldassarreFe/deep-koalarization.svg">
-  </a>
-  <a href="https://github.com/baldassarreFe/deep-koalarization/stargazers">
-    <img alt="GitHub stars" src="https://img.shields.io/github/stars/baldassarreFe/deep-koalarization.svg">
-  </a>
-  <a href="https://github.com/baldassarreFe/deep-koalarization/network">
-    <img alt="GitHub forks" src="https://img.shields.io/github/forks/baldassarreFe/deep-koalarization.svg">
-  </a>
-  <a href="https://arxiv.org/abs/1712.03400">
-    <img alt="arXiv" src="https://img.shields.io/badge/paper-arXiv-_.svg?&color=B31B1B">
-  </a>
-  <a href="https://twitter.com/intent/tweet?text=Wow:&url=https%3A%2F%2Fgithub.com%2FbaldassarreFe%2Fdeep-koalarization">
-    <img alt="Twitter" src="https://img.shields.io/twitter/url/https/github.com/baldassarreFe/deep-koalarization.svg?style=social">
-  </a>
+  <a href="https://www.python.org/downloads/release/python-360/"><img alt="Python 3.6" src="https://img.shields.io/badge/python-3.6-blue.svg"></a>
+  <a href="https://github.com/baldassarreFe/deep-koalarization/blob/master/LICENSE"><img alt="GitHub License" src="https://img.shields.io/github/license/baldassarreFe/deep-koalarization.svg"></a>
+  <a href="https://github.com/baldassarreFe/deep-koalarization/stargazers"><img alt="GitHub stars" src="https://img.shields.io/github/stars/baldassarreFe/deep-koalarization.svg"></a>
+  <a href="https://github.com/baldassarreFe/deep-koalarization/network"><img alt="GitHub forks" src="https://img.shields.io/github/forks/baldassarreFe/deep-koalarization.svg"></a>
+  <a href="https://arxiv.org/abs/1712.03400"><img alt="arXiv" src="https://img.shields.io/badge/paper-arXiv-_.svg?&color=B31B1B"></a>
+  <a href="https://twitter.com/intent/tweet?text=Wow:&url=https%3A%2F%2Fgithub.com%2FbaldassarreFe%2Fdeep-koalarization"><img alt="Twitter" src="https://img.shields.io/twitter/url/https/github.com/baldassarreFe/deep-koalarization.svg?style=social"></a>
 </p>
 
 
@@ -113,20 +101,7 @@ The Training data for this experiment could come from any source. We decuded to
 
 ## Use the code
 
-### Installation
-
-Refer to [INSTRUCTIONS](INSTRUCTIONS.md) to install and use the code in this repo. But in short, to install the package,
-simply run:
-
-```
-pip install -e .
-```
-
-
-### Getting started
-
-You need to generate the [necessary data](DATASET.md). Once this is done, you can inspect the [koalarization code](src/koalarization/).
-
+Refer to [INSTRUCTIONS](INSTRUCTIONS.md) to install and use the code in this repo.
 
 ## Community
 

diff --git a/tests/README.md b/tests/README.md
@@ -1,28 +1,50 @@
 # Testing
 
 <!-- **Note:** Running the tests together fails due to some internal TensorFlow problem, avoid running: -->
+All commands must to be run at repository folder level. 
 
-### Run all tests
+Before running any test, generate a sample dataset (see below).
+
+### Table of contents
+
+- [Prepare sample dataset](#prepare-sample-dataset)
+- [Run tests](#run-tests)
+    - [Run all tests](#run-all-tests)
+    - [Run specific module tests](#run-specific-module-tests)
+
+## Prepare sample dataset
+
+The sample dataset is generated using the [unsplash.txt](../data/unsplash.txt) file. Run
+
+```bash
+bash tests/prepare-sample-dataset.sh
+```
+
+which places the sample dataset in folder folder [tests/data](data).
+
+## Run tests
+
+### Run all tests
 
 You can run all tests by using
 
 ```bash
-python3.6 -m unittest discover tests -v
+python -m unittest discover tests -v
 ```
 
 > **Note:** `test_colorization.py` will train a simple model for 20 epochs, might take few seconds.
 
-### Run specific module tests
+### Run specific module tests
 
 You can run single test cases
 
 ```bash
-python3.6 -m unittest -v \
+python -m unittest -v \
     tests/batching/test_write_read_base.py \
     tests/batching/test_write_read_variable.py
 ```
 
 Or run them individually:
 ```bash
-python3.6 -m tests.batching.test_write_read_single_image -v
+python -m tests.batching.test_write_read_single_image -v
 ```
diff --git a/tests/batching/test_filename_queues.py b/tests/batching/test_filename_queues.py
@@ -3,10 +3,12 @@
 
 import tensorflow as tf
 
-from koalarization.dataset.shared import DIR_RESIZED
 from koalarization.dataset.tfrecords import queue_single_images_from_folder
 
 
+DIR_RESIZED = './tests/data/resized/'
+
+
 class TestFilenameQueues(unittest.TestCase):
     def test_one(self):
         """Load all images from a folder once and print the result."""

diff --git a/tests/batching/test_write_read_base.py b/tests/batching/test_write_read_base.py
@@ -9,10 +9,12 @@
 
 import tensorflow as tf
 
-from koalarization.dataset.shared import DIR_TFRECORD
 from koalarization.dataset.tfrecords import RecordWriter, BatchableRecordReader
 
 
+DIR_TFRECORDS = './tests/data/tfrecords'
+
+
 class BaseTypesRecordWriter(RecordWriter):
     def write_test(self, i):
         example = tf.train.Example(
@@ -58,7 +60,7 @@ def test_base_records(self):
         # WRITING
         for i in range(self.number_of_records):
             record_name = "base_type_{}.tfrecord".format(i)
-            with BaseTypesRecordWriter(record_name, DIR_TFRECORD) as writer:
+            with BaseTypesRecordWriter(record_name, DIR_TFRECORDS) as writer:
                 for j in range(self.samples_per_record):
                     writer.write_test(i * self.number_of_records + j)
 
@@ -67,7 +69,7 @@ def test_base_records(self):
         # otherwise the internal shuffle queue gets created but its
         # threads won't start
 
-        reader = BaseTypesRecordReader("base_type_*.tfrecord", DIR_TFRECORD)
+        reader = BaseTypesRecordReader("base_type_*.tfrecord", DIR_TFRECORDS)
         read_one_example = reader.read_operation
         read_batched_examples = reader.read_batch(50)
 

diff --git a/tests/batching/test_write_read_fixed.py b/tests/batching/test_write_read_fixed.py
@@ -10,10 +10,12 @@
 import numpy as np
 import tensorflow as tf
 
-from koalarization.dataset.shared import DIR_TFRECORD
 from koalarization.dataset.tfrecords import RecordWriter, BatchableRecordReader
 
 
+DIR_TFRECORDS = './tests/data/tfrecords'
+
+
 class FixedSizeTypesRecordWriter(RecordWriter):
     def write_test(self):
         # Fixed size lists
@@ -65,15 +67,15 @@ def _create_read_operation(self):
 class TestFixedSizeRecords(unittest.TestCase):
     def test_fixed_size_record(self):
         # WRITING
-        with FixedSizeTypesRecordWriter("fixed_size.tfrecord", DIR_TFRECORD) as writer:
+        with FixedSizeTypesRecordWriter("fixed_size.tfrecord", DIR_TFRECORDS) as writer:
             writer.write_test()
             writer.write_test()
 
         # READING
         # Important: read_batch MUST be called before start_queue_runners,
         # otherwise the internal shuffle queue gets created but its
         # threads won't start
-        reader = FixedSizeTypesRecordReader("fixed_size.tfrecord", DIR_TFRECORD)
+        reader = FixedSizeTypesRecordReader("fixed_size.tfrecord", DIR_TFRECORDS)
         read_one_example = reader.read_operation
         read_batched_examples = reader.read_batch(4)
 

diff --git a/tests/batching/test_write_read_lab_image.py b/tests/batching/test_write_read_lab_image.py
@@ -13,12 +13,15 @@
 
 from koalarization import l_to_rgb
 from koalarization import lab_to_rgb
-from koalarization.dataset.shared import DIR_RESIZED, DIR_TFRECORD
 from koalarization.dataset.tfrecords import LabImageRecordReader
 from koalarization.dataset.tfrecords import LabImageRecordWriter
 from koalarization.dataset.tfrecords import queue_single_images_from_folder
 
 
+DIR_RESIZED = './tests/data/resized/'
+DIR_TFRECORDS = './tests/data/tfrecords'
+
+
 class TestLabImageWriteRead(unittest.TestCase):
     def test_lab_image_write_read(self):
         self._lab_image_write()
@@ -30,7 +33,7 @@ def _lab_image_write(self):
         img_emb = tf.truncated_normal(shape=[1001])
 
         # Create a writer to write_image the images
-        lab_writer = LabImageRecordWriter("test_lab_images.tfrecord", DIR_TFRECORD)
+        lab_writer = LabImageRecordWriter("test_lab_images.tfrecord", DIR_TFRECORDS)
 
         # Start a new session to run the operations
         with tf.Session() as sess:
@@ -74,7 +77,7 @@ def _lab_image_read(self):
         # Important: read_batch MUST be called before start_queue_runners,
         # otherwise the internal shuffle queue gets created but its
         # threads won't start
-        irr = LabImageRecordReader("test_lab_images.tfrecord", DIR_TFRECORD)
+        irr = LabImageRecordReader("test_lab_images.tfrecord", DIR_TFRECORDS)
         read_one_example = irr.read_operation
         read_batched_examples = irr.read_batch(20)
 

diff --git a/tests/batching/test_write_read_single_image.py b/tests/batching/test_write_read_single_image.py
@@ -11,14 +11,17 @@
 import matplotlib.pyplot as plt
 import tensorflow as tf
 
-from koalarization.dataset.shared import DIR_RESIZED, DIR_TFRECORD
 from koalarization.dataset.tfrecords import (
     SingleImageRecordWriter,
     SingleImageRecordReader,
 )
 from koalarization.dataset.tfrecords import queue_single_images_from_folder
 
 
+DIR_RESIZED = './tests/data/resized'
+DIR_TFRECORDS = './tests/data/tfrecords'
+
+
 class TestSingleImageWriteRead(unittest.TestCase):
     def test_single_image_write_read(self):
         self._single_image_write()
@@ -29,7 +32,7 @@ def _single_image_write(self):
         img_key, img_tensor, _ = queue_single_images_from_folder(DIR_RESIZED)
 
         # Create a writer to write_image the images
-        single_writer = SingleImageRecordWriter("single_images.tfrecord", DIR_TFRECORD)
+        single_writer = SingleImageRecordWriter("single_images.tfrecord", DIR_TFRECORDS)
 
         # Start a new session to run the operations
         with tf.Session() as sess:
@@ -70,7 +73,7 @@ def _single_image_read(self):
         # Important: read_batch MUST be called before start_queue_runners,
         # otherwise the internal shuffle queue gets created but its
         # threads won't start
-        irr = SingleImageRecordReader("single_images.tfrecord", DIR_TFRECORD)
+        irr = SingleImageRecordReader("single_images.tfrecord", DIR_TFRECORDS)
         read_one_example = irr.read_operation
         read_batched_examples = irr.read_batch(10)
 

diff --git a/tests/batching/test_write_read_variable.py b/tests/batching/test_write_read_variable.py
@@ -10,10 +10,12 @@
 import numpy as np
 import tensorflow as tf
 
-from koalarization.dataset.shared import DIR_TFRECORD
 from koalarization.dataset.tfrecords import RecordWriter, RecordReader
 
 
+DIR_TFRECORDS = './tests/data/tfrecords'
+
+
 class VariableSizeTypesRecordWriter(RecordWriter):
     """
     The tensors returned don't have the same shape, so the read operations
@@ -72,12 +74,12 @@ def _create_read_operation(self):
 class TestVariableSizeRecords(unittest.TestCase):
     def test_variable_size_record(self):
         # WRITING
-        with VariableSizeTypesRecordWriter("variable.tfrecord", DIR_TFRECORD) as writer:
+        with VariableSizeTypesRecordWriter("variable.tfrecord", DIR_TFRECORDS) as writer:
             for i in range(2):
                 writer.write_test()
 
         # READING
-        reader = VariableSizeTypesRecordReader("variable.tfrecord", DIR_TFRECORD)
+        reader = VariableSizeTypesRecordReader("variable.tfrecord", DIR_TFRECORDS)
         read_one_example = reader.read_operation
 
         with tf.Session() as sess:

diff --git a/tests/data/.gitignore b/tests/data/.gitignore
@@ -0,0 +1,8 @@
+# Ignore everything in this directory
+original/*
+resized/*
+tfrecords/*
+inception_resnet_v2_2016_08_30.ckpt
+
+# Except these files
+!resized
diff --git a/tests/prepare-sample-dataset.sh b/tests/prepare-sample-dataset.sh
@@ -0,0 +1,14 @@
+# If following paths are changed, make sure to change scripts in ./tests/batching/ accordingly
+UNSPASH_URLS='./data/unsplash.txt' 
+DIR_ORIGINAL='./tests/data/original/'
+DIR_RESIZED='./tests/data/resized/'
+DIR_TFRECORDS='./tests/data/tfrecords'
+CHECKPOINT_INCEPTION='./data/inception_resnet_v2_2016_08_30.ckpt'
+
+echo "1/3 Downloading unsplash images"
+python -m koalarization.dataset.download ${UNSPASH_URLS} ${DIR_ORIGINAL}
+echo "2/3 Resizing images"
+python -m koalarization.dataset.resize ${DIR_ORIGINAL} ${DIR_RESIZED}
+echo "3/3 Generating TF Records"  # Assumes inception checkpoint is placed in ./data
+python -m koalarization.dataset.lab_batch -c ${CHECKPOINT_INCEPTION}\
+    ${DIR_RESIZED} ${DIR_TFRECORDS}
-Original file line number
+Diff line change
@@ Expand Up / @@ -101,4 +101,4 @@ resized @@
     py36
     py37
-    imagenet
+    imagenet