Investigate application of GANs
in medical images. Scope of the project include:
- Generate artificial images of vertebra units (VUs) conditioned on anatomical region.
- Conduct an extensive evaluation of the dataset behavior and on the trade off between image quality/dataset faithfulness and privacy.
- Link to the data: https://zenodo.org/record/5031881
- The synethetic dataset (10000 pairs of images and region, 2.95GB) is shared with the code (hdf5 dataset format).
With some minor tweaking, the synthetic dataset can be used to run training and analysis to validate the code. (The analysis itself will be far less relevant because comparing privacy on two synthetic dataset is not very useful)
Because the original data is not anonimized, it is not shared with the code. The preprocessing is not shared here either to avoid sharing sensitive system information.
This code cannot be run end to end out of the box.
Notebooks for analaysis still hold latest state with figures.
- Training and generating synthetic VUs and corresponding regions
Training
: Model training code can be found here:Inference
: Generating synthetic samples:
- Fidelity - Analysis
Fetching and plotting
real images from different regions, plotting synthetic samples, interpolating between classes:
- Diversity - Analysis
Preprocessing
, preparing dataset and training UMAP:UMAP
diversity visualization:Classification
analysis, quantitative diversity evaluation:
- Privacy - Analysis
Preprocessing
, computing features and similartiy:Pairwise and density attack
robustness:Embedding space
density visualization:
.
├── README.md
├── environment.yml # pgan-env
├── synthetic_dataset.h5
└── src
├── helper.py # utiliy function (current date-time for mlflow/grid for image visualization)
└── manuscript
├── Diversity
│ ├── 1_generate_synth_112_224.ipynb
│ ├── 2_train_umap_112_224.ipynb
│ ├── 3_plot_umap_diversity.ipynb
│ ├── 4_classifier_analysis.ipynb
│ ├── classifier_logs # restricted data (classifier on train might not be private)
│ │ └── ...
│ ├── diversity_saves # restricted data (post processed real dataset included)
│ │ └── ...
│ ├── images
│ │ └── ...
│ └── train_classifier.py
├── Fidelity
│ ├── 1_images_qualitative_inspection.ipynb
│ ├── helpers
│ │ └── utils.py # code for interpolation between regions
│ └── images
│ └── ...
├── Privacy
│ ├── 1_prepare_9_64_64_pixel_space.ipynb
│ ├── 2_UMAP_64_64.ipynb
│ ├── 3_compute_distances.ipynb
│ ├── 4_plot_pairwise_attacks.ipynb
│ ├── 5_plot_density_attacks.ipynb
│ ├── 6_density_plot.ipynb
│ ├── images
│ │ ├── ...
│ │ └── supp
│ │ └── ...
│ └── privacy_saves # restricted data (post processed real dataset included, UMAP object might
│ └── ... # not be private)
└── Train
├── batchers.py
├── fixed_architecture.py
├── Generate_image.ipynb
├── training_parser.py
├── train_region.py
├── restricted # restricted data (GAN weights, local machine preprocessing, indexes
│ └── ... # of train/val split)
└── transforms
├── augmentations.py
└── transforms.py
Authors:
- Hanxi Sun*, Purdue University, Department of Statistics
- Jason Plawinski*, Novartis
- Sajanth Subramaniam, Novartis
- Amir Jamaludin, Oxford Big Data Institute
- Timor Kadir, Oxford Big Data Institute
- Aimee Readie, Novartis
- Gregory Ligozio, Novartis
- David Ohlssen, Novartis
- Mark Baillie#, Novartis
- Thibaud Coroller#@, Novartis
*co-first authors; #co-last authors; @ corresponding author