Skip to content

Commit

Permalink
Merge pull request #17 from baldassarreFe/feature/typos
Browse files Browse the repository at this point in the history
Feature/typos
  • Loading branch information
lucasrodes authored May 16, 2020
2 parents 1c2df17 + 4a2a92d commit af99f75
Show file tree
Hide file tree
Showing 13 changed files with 112 additions and 55 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -101,4 +101,4 @@ resized

py36
py37
imagenet
imagenet
11 changes: 11 additions & 0 deletions DATASET.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,17 @@ Storing the images and the embedding as separate files on disk (`jpeg/png` and `
the performances during training, so all the image-embedding pairs are stored in binary format in large
continuous [TFRecords](https://www.tensorflow.org/programmers_guide/datasets).

### Table of contents

- [Pipeline](#pipeline)
- [Imagenet images](#imagenet-images)
- [Resizing for training](#resizing-for-training)
- [Converting to TFRecords](#converting-to-tfrecords)
- [Validation set](#validation-set)
- [Space on disk notes](#space-on-disk-notes)
- [The images](#the-images)
- [The TFRecords](#the-tfrecords)

## Pipeline

All the data preparation steps are independent and persisted on the disk, the default (and recommended) folder structure is:
Expand Down
15 changes: 14 additions & 1 deletion INSTRUCTIONS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# Instructions

### Table of contents
- [Environment](#environment)
- [Dataset](#dataset)
- [Training and evaluation](#training-and-evaluation)
- [Development and test the code](#development-and-test-the-code)

## Environment
The project is based on Python 3.6, to manage the dependencies contained in
[`requirements.txt`](requirements.txt) a virtual environment is recommended.
Expand All @@ -20,9 +26,12 @@ pip install -e .
For GPU-support, run:

```
$ pip install -e .[gpu]
pip install -e .[gpu]
```

> If it does not work, manually install `tensorflow-gpu` library (must be compatible with your OS, python version and
> `tensorflow` library). More info [here](https://www.tensorflow.org/install/gpu).
## Dataset
Prior to training, the images from ImageNet need to be downloaded, resized and processed.

Expand Down Expand Up @@ -64,3 +73,7 @@ python -m koalarization.evaluate \
--run-id 'run1' \
'data/tfrecords' 'runs/'
```

## Development and test the code

See [here](tests/README.md).
39 changes: 7 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,24 +9,12 @@
</p>

<p align="center">
<a href="https://www.python.org/downloads/release/python-360/">
<img alt="Python 3.6" src="https://img.shields.io/badge/python-3.6-blue.svg">
</a>
<a href="https://github.com/baldassarreFe/deep-koalarization/blob/master/LICENSE">
<img alt="GitHub License" src="https://img.shields.io/github/license/baldassarreFe/deep-koalarization.svg">
</a>
<a href="https://github.com/baldassarreFe/deep-koalarization/stargazers">
<img alt="GitHub stars" src="https://img.shields.io/github/stars/baldassarreFe/deep-koalarization.svg">
</a>
<a href="https://github.com/baldassarreFe/deep-koalarization/network">
<img alt="GitHub forks" src="https://img.shields.io/github/forks/baldassarreFe/deep-koalarization.svg">
</a>
<a href="https://arxiv.org/abs/1712.03400">
<img alt="arXiv" src="https://img.shields.io/badge/paper-arXiv-_.svg?&color=B31B1B">
</a>
<a href="https://twitter.com/intent/tweet?text=Wow:&url=https%3A%2F%2Fgithub.com%2FbaldassarreFe%2Fdeep-koalarization">
<img alt="Twitter" src="https://img.shields.io/twitter/url/https/github.com/baldassarreFe/deep-koalarization.svg?style=social">
</a>
<a href="https://www.python.org/downloads/release/python-360/"><img alt="Python 3.6" src="https://img.shields.io/badge/python-3.6-blue.svg"></a>
<a href="https://github.com/baldassarreFe/deep-koalarization/blob/master/LICENSE"><img alt="GitHub License" src="https://img.shields.io/github/license/baldassarreFe/deep-koalarization.svg"></a>
<a href="https://github.com/baldassarreFe/deep-koalarization/stargazers"><img alt="GitHub stars" src="https://img.shields.io/github/stars/baldassarreFe/deep-koalarization.svg"></a>
<a href="https://github.com/baldassarreFe/deep-koalarization/network"><img alt="GitHub forks" src="https://img.shields.io/github/forks/baldassarreFe/deep-koalarization.svg"></a>
<a href="https://arxiv.org/abs/1712.03400"><img alt="arXiv" src="https://img.shields.io/badge/paper-arXiv-_.svg?&color=B31B1B"></a>
<a href="https://twitter.com/intent/tweet?text=Wow:&url=https%3A%2F%2Fgithub.com%2FbaldassarreFe%2Fdeep-koalarization"><img alt="Twitter" src="https://img.shields.io/twitter/url/https/github.com/baldassarreFe/deep-koalarization.svg?style=social"></a>
</p>


Expand Down Expand Up @@ -113,20 +101,7 @@ The Training data for this experiment could come from any source. We decuded to

## Use the code

### Installation

Refer to [INSTRUCTIONS](INSTRUCTIONS.md) to install and use the code in this repo. But in short, to install the package,
simply run:

```
pip install -e .
```


### Getting started

You need to generate the [necessary data](DATASET.md). Once this is done, you can inspect the [koalarization code](src/koalarization/).

Refer to [INSTRUCTIONS](INSTRUCTIONS.md) to install and use the code in this repo.

## Community

Expand Down
32 changes: 27 additions & 5 deletions tests/README.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,50 @@
# Testing

<!-- **Note:** Running the tests together fails due to some internal TensorFlow problem, avoid running: -->
All commands must to be run at repository folder level.

### Run all tests
Before running any test, generate a sample dataset (see below).

### Table of contents

- [Prepare sample dataset](#prepare-sample-dataset)
- [Run tests](#run-tests)
- [Run all tests](#run-all-tests)
- [Run specific module tests](#run-specific-module-tests)

## Prepare sample dataset

The sample dataset is generated using the [unsplash.txt](../data/unsplash.txt) file. Run

```bash
bash tests/prepare-sample-dataset.sh
```

which places the sample dataset in folder folder [tests/data](data).

## Run tests

### Run all tests

You can run all tests by using

```bash
python3.6 -m unittest discover tests -v
python -m unittest discover tests -v
```

> **Note:** `test_colorization.py` will train a simple model for 20 epochs, might take few seconds.
### Run specific module tests
### Run specific module tests

You can run single test cases

```bash
python3.6 -m unittest -v \
python -m unittest -v \
tests/batching/test_write_read_base.py \
tests/batching/test_write_read_variable.py
```

Or run them individually:
```bash
python3.6 -m tests.batching.test_write_read_single_image -v
python -m tests.batching.test_write_read_single_image -v
```
4 changes: 3 additions & 1 deletion tests/batching/test_filename_queues.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,12 @@

import tensorflow as tf

from koalarization.dataset.shared import DIR_RESIZED
from koalarization.dataset.tfrecords import queue_single_images_from_folder


DIR_RESIZED = './tests/data/resized/'


class TestFilenameQueues(unittest.TestCase):
def test_one(self):
"""Load all images from a folder once and print the result."""
Expand Down
8 changes: 5 additions & 3 deletions tests/batching/test_write_read_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,12 @@

import tensorflow as tf

from koalarization.dataset.shared import DIR_TFRECORD
from koalarization.dataset.tfrecords import RecordWriter, BatchableRecordReader


DIR_TFRECORDS = './tests/data/tfrecords'


class BaseTypesRecordWriter(RecordWriter):
def write_test(self, i):
example = tf.train.Example(
Expand Down Expand Up @@ -58,7 +60,7 @@ def test_base_records(self):
# WRITING
for i in range(self.number_of_records):
record_name = "base_type_{}.tfrecord".format(i)
with BaseTypesRecordWriter(record_name, DIR_TFRECORD) as writer:
with BaseTypesRecordWriter(record_name, DIR_TFRECORDS) as writer:
for j in range(self.samples_per_record):
writer.write_test(i * self.number_of_records + j)

Expand All @@ -67,7 +69,7 @@ def test_base_records(self):
# otherwise the internal shuffle queue gets created but its
# threads won't start

reader = BaseTypesRecordReader("base_type_*.tfrecord", DIR_TFRECORD)
reader = BaseTypesRecordReader("base_type_*.tfrecord", DIR_TFRECORDS)
read_one_example = reader.read_operation
read_batched_examples = reader.read_batch(50)

Expand Down
8 changes: 5 additions & 3 deletions tests/batching/test_write_read_fixed.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,12 @@
import numpy as np
import tensorflow as tf

from koalarization.dataset.shared import DIR_TFRECORD
from koalarization.dataset.tfrecords import RecordWriter, BatchableRecordReader


DIR_TFRECORDS = './tests/data/tfrecords'


class FixedSizeTypesRecordWriter(RecordWriter):
def write_test(self):
# Fixed size lists
Expand Down Expand Up @@ -65,15 +67,15 @@ def _create_read_operation(self):
class TestFixedSizeRecords(unittest.TestCase):
def test_fixed_size_record(self):
# WRITING
with FixedSizeTypesRecordWriter("fixed_size.tfrecord", DIR_TFRECORD) as writer:
with FixedSizeTypesRecordWriter("fixed_size.tfrecord", DIR_TFRECORDS) as writer:
writer.write_test()
writer.write_test()

# READING
# Important: read_batch MUST be called before start_queue_runners,
# otherwise the internal shuffle queue gets created but its
# threads won't start
reader = FixedSizeTypesRecordReader("fixed_size.tfrecord", DIR_TFRECORD)
reader = FixedSizeTypesRecordReader("fixed_size.tfrecord", DIR_TFRECORDS)
read_one_example = reader.read_operation
read_batched_examples = reader.read_batch(4)

Expand Down
9 changes: 6 additions & 3 deletions tests/batching/test_write_read_lab_image.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,15 @@

from koalarization import l_to_rgb
from koalarization import lab_to_rgb
from koalarization.dataset.shared import DIR_RESIZED, DIR_TFRECORD
from koalarization.dataset.tfrecords import LabImageRecordReader
from koalarization.dataset.tfrecords import LabImageRecordWriter
from koalarization.dataset.tfrecords import queue_single_images_from_folder


DIR_RESIZED = './tests/data/resized/'
DIR_TFRECORDS = './tests/data/tfrecords'


class TestLabImageWriteRead(unittest.TestCase):
def test_lab_image_write_read(self):
self._lab_image_write()
Expand All @@ -30,7 +33,7 @@ def _lab_image_write(self):
img_emb = tf.truncated_normal(shape=[1001])

# Create a writer to write_image the images
lab_writer = LabImageRecordWriter("test_lab_images.tfrecord", DIR_TFRECORD)
lab_writer = LabImageRecordWriter("test_lab_images.tfrecord", DIR_TFRECORDS)

# Start a new session to run the operations
with tf.Session() as sess:
Expand Down Expand Up @@ -74,7 +77,7 @@ def _lab_image_read(self):
# Important: read_batch MUST be called before start_queue_runners,
# otherwise the internal shuffle queue gets created but its
# threads won't start
irr = LabImageRecordReader("test_lab_images.tfrecord", DIR_TFRECORD)
irr = LabImageRecordReader("test_lab_images.tfrecord", DIR_TFRECORDS)
read_one_example = irr.read_operation
read_batched_examples = irr.read_batch(20)

Expand Down
9 changes: 6 additions & 3 deletions tests/batching/test_write_read_single_image.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,17 @@
import matplotlib.pyplot as plt
import tensorflow as tf

from koalarization.dataset.shared import DIR_RESIZED, DIR_TFRECORD
from koalarization.dataset.tfrecords import (
SingleImageRecordWriter,
SingleImageRecordReader,
)
from koalarization.dataset.tfrecords import queue_single_images_from_folder


DIR_RESIZED = './tests/data/resized'
DIR_TFRECORDS = './tests/data/tfrecords'


class TestSingleImageWriteRead(unittest.TestCase):
def test_single_image_write_read(self):
self._single_image_write()
Expand All @@ -29,7 +32,7 @@ def _single_image_write(self):
img_key, img_tensor, _ = queue_single_images_from_folder(DIR_RESIZED)

# Create a writer to write_image the images
single_writer = SingleImageRecordWriter("single_images.tfrecord", DIR_TFRECORD)
single_writer = SingleImageRecordWriter("single_images.tfrecord", DIR_TFRECORDS)

# Start a new session to run the operations
with tf.Session() as sess:
Expand Down Expand Up @@ -70,7 +73,7 @@ def _single_image_read(self):
# Important: read_batch MUST be called before start_queue_runners,
# otherwise the internal shuffle queue gets created but its
# threads won't start
irr = SingleImageRecordReader("single_images.tfrecord", DIR_TFRECORD)
irr = SingleImageRecordReader("single_images.tfrecord", DIR_TFRECORDS)
read_one_example = irr.read_operation
read_batched_examples = irr.read_batch(10)

Expand Down
8 changes: 5 additions & 3 deletions tests/batching/test_write_read_variable.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,12 @@
import numpy as np
import tensorflow as tf

from koalarization.dataset.shared import DIR_TFRECORD
from koalarization.dataset.tfrecords import RecordWriter, RecordReader


DIR_TFRECORDS = './tests/data/tfrecords'


class VariableSizeTypesRecordWriter(RecordWriter):
"""
The tensors returned don't have the same shape, so the read operations
Expand Down Expand Up @@ -72,12 +74,12 @@ def _create_read_operation(self):
class TestVariableSizeRecords(unittest.TestCase):
def test_variable_size_record(self):
# WRITING
with VariableSizeTypesRecordWriter("variable.tfrecord", DIR_TFRECORD) as writer:
with VariableSizeTypesRecordWriter("variable.tfrecord", DIR_TFRECORDS) as writer:
for i in range(2):
writer.write_test()

# READING
reader = VariableSizeTypesRecordReader("variable.tfrecord", DIR_TFRECORD)
reader = VariableSizeTypesRecordReader("variable.tfrecord", DIR_TFRECORDS)
read_one_example = reader.read_operation

with tf.Session() as sess:
Expand Down
8 changes: 8 additions & 0 deletions tests/data/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Ignore everything in this directory
original/*
resized/*
tfrecords/*
inception_resnet_v2_2016_08_30.ckpt

# Except these files
!resized
14 changes: 14 additions & 0 deletions tests/prepare-sample-dataset.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# If following paths are changed, make sure to change scripts in ./tests/batching/ accordingly
UNSPASH_URLS='./data/unsplash.txt'
DIR_ORIGINAL='./tests/data/original/'
DIR_RESIZED='./tests/data/resized/'
DIR_TFRECORDS='./tests/data/tfrecords'
CHECKPOINT_INCEPTION='./data/inception_resnet_v2_2016_08_30.ckpt'

echo "1/3 Downloading unsplash images"
python -m koalarization.dataset.download ${UNSPASH_URLS} ${DIR_ORIGINAL}
echo "2/3 Resizing images"
python -m koalarization.dataset.resize ${DIR_ORIGINAL} ${DIR_RESIZED}
echo "3/3 Generating TF Records"  # Assumes inception checkpoint is placed in ./data
python -m koalarization.dataset.lab_batch -c ${CHECKPOINT_INCEPTION}\
${DIR_RESIZED} ${DIR_TFRECORDS}

0 comments on commit af99f75

Please sign in to comment.