Skip to content

Latest commit

 

History

History
119 lines (108 loc) · 6.66 KB

README.md

File metadata and controls

119 lines (108 loc) · 6.66 KB

A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs

Publications

Project

Investigate application of GANs in medical images. Scope of the project include:

  1. Generate artificial images of vertebra units (VUs) conditioned on anatomical region.
  2. Conduct an extensive evaluation of the dataset behavior and on the trade off between image quality/dataset faithfulness and privacy.

Related dataset:

  • Link to the data: https://zenodo.org/record/5031881
  • The synethetic dataset (10000 pairs of images and region, 2.95GB) is shared with the code (hdf5 dataset format).
    With some minor tweaking, the synthetic dataset can be used to run training and analysis to validate the code. (The analysis itself will be far less relevant because comparing privacy on two synthetic dataset is not very useful)

Code Lifting

Because the original data is not anonimized, it is not shared with the code. The preprocessing is not shared here either to avoid sharing sensitive system information.
This code cannot be run end to end out of the box.
Notebooks for analaysis still hold latest state with figures.

Scripts, Notebooks and Demos

  1. Training and generating synthetic VUs and corresponding regions
  2. Fidelity - Analysis
  3. Diversity - Analysis
  4. Privacy - Analysis

Structure

.
├── README.md
├── environment.yml                  # pgan-env
├── synthetic_dataset.h5
└── src
    ├── helper.py                         # utiliy function (current date-time for mlflow/grid for image visualization)
    └── manuscript                        
       ├── Diversity
       │   ├── 1_generate_synth_112_224.ipynb
       │   ├── 2_train_umap_112_224.ipynb
       │   ├── 3_plot_umap_diversity.ipynb
       │   ├── 4_classifier_analysis.ipynb
       │   ├── classifier_logs                 # restricted data (classifier on train might not be private)
       │   │   └── ... 
       │   ├── diversity_saves                 # restricted data (post processed real dataset included)
       │   │   └── ... 
       │   ├── images
       │   │   └── ... 
       │   └── train_classifier.py
       ├── Fidelity
       │   ├── 1_images_qualitative_inspection.ipynb
       │   ├── helpers
       │   │   └── utils.py                    # code for interpolation between regions
       │   └── images
       │       └── ... 
       ├── Privacy
       │   ├── 1_prepare_9_64_64_pixel_space.ipynb
       │   ├── 2_UMAP_64_64.ipynb
       │   ├── 3_compute_distances.ipynb
       │   ├── 4_plot_pairwise_attacks.ipynb
       │   ├── 5_plot_density_attacks.ipynb
       │   ├── 6_density_plot.ipynb
       │   ├── images
       │   │   ├── ...
       │   │   └── supp
       │   │       └── ... 
       │   └── privacy_saves                  # restricted data (post processed real dataset included, UMAP object might
       │       └── ...                        #  not be private)
       └── Train
            ├── batchers.py
            ├── fixed_architecture.py
            ├── Generate_image.ipynb
            ├── training_parser.py
            ├── train_region.py
            ├── restricted                    # restricted data (GAN weights, local machine preprocessing, indexes
            │   └── ...                       # of train/val split)
            └── transforms
                ├── augmentations.py
                └── transforms.py

Team

Authors:

  • Hanxi Sun*, Purdue University, Department of Statistics
  • Jason Plawinski*, Novartis
  • Sajanth Subramaniam, Novartis
  • Amir Jamaludin, Oxford Big Data Institute
  • Timor Kadir, Oxford Big Data Institute
  • Aimee Readie, Novartis
  • Gregory Ligozio, Novartis
  • David Ohlssen, Novartis
  • Mark Baillie#, Novartis
  • Thibaud Coroller#@, Novartis

*co-first authors; #co-last authors; @ corresponding author