Skip to content

Data cleaning & preprocessing

Viet edited this page May 24, 2019 · 12 revisions

Basic scheme: https://drive.google.com/file/d/1ooGtgptBMmHt6cuFXs1TAkPeOpHFMQOL/view?usp=sharing

Augmentation: artificially increase the train data

Possible approaches to transform the data

The first one is to transform the current images. The second one is to use GANs to produce new images

  • Different styles of transformation:

light augmentation: only flipping etc. heavier augmentation (see light and heavier augmentation)

  • GAN:

Use GAN to produce new images similar to the current images to feed them in the model.

2 possible approaches to use augmentation:

  • offline augmentation: Extend the present data set. The images will be transformed (for example with numpy) and stored.
  • online augmentation (augmentation on the fly): Extend the data set on each mini-batch. The transformed images won't be stored physically and will be used in a subset in mini-batches. Set seed to make it reproducible.

Approaches to this project:

  1. Use traditional transformation like flipping. Don't move the images because it can result in missing parts of the hand and lead to a not normal hand.

Papers about data augmentation:

Data augmentation (medium) introduction

light and heavier augmentation

Using GANs to generate new data for x-ray

Using GANs to improve CNN classification

Fast classification

Data augmentation techniques

Data augmentation techniques II

Preprocessing x-ray data

-DOPING: Generative Data Augmentation for Unsupervised Anomaly Detection with GAN:

  • "Show that our oversampling pipeline is a unified one: it is generally applicable to datasets with different complex data distributions. To the best of our knowledge, our method is the first data augmentation technique focused on improving performance in unsupervised anomaly detection. "

Practical implementation:

Image Augmentation Examples in Python: Medium. Numpy

Types of Data Augmentation: MXNet

DATA AUGMENTATION TECHNIQUES AND PITFALLS FOR SMALL DATASETS

Building powerful image classification models using very little data: Keras

Data augmentation in PyTorch: Forum

Data augmentation : boost your image dataset with few lines of Python: skimage

Data Augmentation for Computer Vision with PyTorch (Part 1: Image Classification)

Data Augmentation and Sampling for Pytorch

Cropping

In order to crop XRAY images, preprocessing is done using the opencv library.

Rectangle shapes are found by using the opencv method findContours.

Properties

  • Skewed rectangles are found and accepted by the algorithm as well.
  • In case when a rectangle is skewed or part of the shape is not in the image, the minimal area rectangle is found around the shape to crop.

Limitations

  • When a big part of a rectangle is not in the image frames, there are problems with detecting it.

References

Shape detection tutorial
Cropping found shape