A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs

Publications

ArXiv: https://arxiv.org/abs/2106.13199

Project

Investigate application of GANs in medical images. Scope of the project include:

Generate artificial images of vertebra units (VUs) conditioned on anatomical region.
Conduct an extensive evaluation of the dataset behavior and on the trade off between image quality/dataset faithfulness and privacy.

Related dataset:

Link to the data: https://zenodo.org/record/5031881
The synethetic dataset (10000 pairs of images and region, 2.95GB) is shared with the code (hdf5 dataset format).
With some minor tweaking, the synthetic dataset can be used to run training and analysis to validate the code. (The analysis itself will be far less relevant because comparing privacy on two synthetic dataset is not very useful)

Code Lifting

Because the original data is not anonimized, it is not shared with the code. The preprocessing is not shared here either to avoid sharing sensitive system information.
This code cannot be run end to end out of the box.
Notebooks for analaysis still hold latest state with figures.

Scripts, Notebooks and Demos

Training and generating synthetic VUs and corresponding regions
- Training: Model training code can be found here:
  - src/manuscript/Train/train_region.py
- Inference: Generating synthetic samples:
  - src/manuscript/Train/Generate_image.ipynb.py
Fidelity - Analysis
- Fetching and plotting real images from different regions, plotting synthetic samples, interpolating between classes:
  - src/manuscript/Fidelity/1_images_qualitative_inspection.ipynb
Diversity - Analysis
- Preprocessing, preparing dataset and training UMAP:
  - 1_generate_synth_112_224.ipynb
  - 2_train_umap_112_224.ipynb
- UMAP diversity visualization:
  - 3_plot_umap_diversity.ipynb
- Classification analysis, quantitative diversity evaluation:
  - 4_classifier_analysis.ipynb
Privacy - Analysis
- Preprocessing, computing features and similartiy:
- Pairwise and density attack robustness:
  - 4_plot_pairwise_attacks.ipynb
  - 5_plot_density_attacks.ipynb
- Embedding space density visualization:
  - 6_density_plot.ipynb

Structure

.
├── README.md
├── environment.yml                  # pgan-env
├── synthetic_dataset.h5
└── src
    ├── helper.py                         # utiliy function (current date-time for mlflow/grid for image visualization)
    └── manuscript                        
       ├── Diversity
       │   ├── 1_generate_synth_112_224.ipynb
       │   ├── 2_train_umap_112_224.ipynb
       │   ├── 3_plot_umap_diversity.ipynb
       │   ├── 4_classifier_analysis.ipynb
       │   ├── classifier_logs                 # restricted data (classifier on train might not be private)
       │   │   └── ... 
       │   ├── diversity_saves                 # restricted data (post processed real dataset included)
       │   │   └── ... 
       │   ├── images
       │   │   └── ... 
       │   └── train_classifier.py
       ├── Fidelity
       │   ├── 1_images_qualitative_inspection.ipynb
       │   ├── helpers
       │   │   └── utils.py                    # code for interpolation between regions
       │   └── images
       │       └── ... 
       ├── Privacy
       │   ├── 1_prepare_9_64_64_pixel_space.ipynb
       │   ├── 2_UMAP_64_64.ipynb
       │   ├── 3_compute_distances.ipynb
       │   ├── 4_plot_pairwise_attacks.ipynb
       │   ├── 5_plot_density_attacks.ipynb
       │   ├── 6_density_plot.ipynb
       │   ├── images
       │   │   ├── ...
       │   │   └── supp
       │   │       └── ... 
       │   └── privacy_saves                  # restricted data (post processed real dataset included, UMAP object might
       │       └── ...                        #  not be private)
       └── Train
            ├── batchers.py
            ├── fixed_architecture.py
            ├── Generate_image.ipynb
            ├── training_parser.py
            ├── train_region.py
            ├── restricted                    # restricted data (GAN weights, local machine preprocessing, indexes
            │   └── ...                       # of train/val split)
            └── transforms
                ├── augmentations.py
                └── transforms.py

Team

Authors:

Hanxi Sun^*, Purdue University, Department of Statistics
Jason Plawinski^*, Novartis
Sajanth Subramaniam, Novartis
Amir Jamaludin, Oxford Big Data Institute
Timor Kadir, Oxford Big Data Institute
Aimee Readie, Novartis
Gregory Ligozio, Novartis
David Ohlssen, Novartis
Mark Baillie^#, Novartis
Thibaud Coroller^#^@, Novartis

^*co-first authors; ^#co-last authors; ^@ corresponding author

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs

Publications

Project

Related dataset:

Code Lifting

Scripts, Notebooks and Demos

Structure

Team

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
src		src
README.md		README.md
environment.yml		environment.yml

tcoroller/pGAN

Folders and files

Latest commit

History

Repository files navigation

A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs

Publications

Project

Related dataset:

Code Lifting

Scripts, Notebooks and Demos

Structure

Team

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages