Chest-X-ray-DL

Our DL 289G Project

Project Overall

Deep learning in Computer Vision is important in medical area. Our project is mainly working on disease prediction for patients using X-ray images with technologies, such as CNNs and RNNs. In this project, we will present a deep learning model, which takes in a sequence of the consecutive previous chest X-rays of patients, analyze the variation and difference across this sequence. For the feature extraction phase of the images, the model uses convolutional neural networks (CNN), such as DenseNet, MobileNet, and ResNet. Besides these, we also compare and analyze specifically the impact of LSTMs on these X-ray based on the extracted feature maps from experimental CNN models. In conclusion, throughout this project, we intend to present a single deep learning framework, which would take in more than one X-ray per patient for analysis and would intend to treat these X-rays as an image sequence which would be then used for predicting the disease label based on the differences observed within the regions present across each follow-up X-ray, and our goal is to identify how does follow-up X-ray images play a significant role in predicting the disease labels.

The dataset we used is found here: https://www.kaggle.com/nih-chest-xrays/data

Brief Script Description and Usage

Preprocessing Scripts

scripts that are used for data preproccessing and data cleaning

Filter_and_Create_Sample_Sets.ipynb
1. Pick Patients who have at least 3 followups (indexing from 0)
2. Create two different sample datasets based on view position and store datasets into CSVs
3. Relevant CSV files for this script:
  1. Data_Entry_2017.csv
  2. df_updated_view_postion.csv
  3. df_updated_finding_labels.csv
  4. df_PA.csv
  5. df_AP.csv
Preprocess_Analyze_Image_Datasets.ipynb
1. PA, PA images dataset processing
  1. Adding Full Paths and Some basic preprocessing
  2. Train, Test, Validation dataset creation
  3. Analyzing the samples for label distributions
2. Saving preprocessed into arrays and store in pickle.
3. Relevant files for this script:
  1. df_PA.csv
  2. df_AP.csv
  3. added_paths_PA.csv
  4. added_paths_AP.csv
  5. PA_train.csv
  6. PA_test.csv
  7. PA_val.csv
  8. AP_train.csv
  9. AP_test.csv
  10. AP_val.csv
  11. AP_images.pkl
  12. PA_images.pkl
Process_NIH_Dataset_Details.ipynb
1. process NIH dataset details
2. data analysis using data visualization
3. Relevant files for this script:
  1. BBox_List_2017.csv
  2. Data_Entry_2017.csv
Sample_Set_Images.ipynb
1. PA, AP Position manual Feature Extraction
2. Relevant files for this script:
  1. df_AP.csv
verify_files.py
1. check if files are correctly merged

single_image_models scripts

scripts that used for single image input models

AP_X_ray_images_baseline_dataprocessing_v2.ipynb and PA_X_ray_images_baseline_dataprocessing_v2.ipynb
1. For single image preprocessing, we added dataframes for AP or PA (from df_pa.csv and df_ap.csv), and then we linked images from google drive and then save them to added_paths_ap.csv and added_paths_pa.csv. We have split that datasets into three one with train, val, and test. We have then resized the images and saved as pickle files
2. Relevant files for this script:
  1. df_AP.csv
  2. added_paths_AP.csv
  3. train_AP.pkl
  4. val_AP.pkl
  5. test_AP.pkl
  6. df_PA.csv
  7. added_paths_PA.csv
  8. train_PA.pkl
  9. val_PA.pkl
  10. test_PA.pkl
Single_Xray_AP_results.ipynb and Single_Xray_PA_results.ipynb
1. storing and analyzing results for single AP and PA X-ray images
2. Relevant files for this script:
  1. added_paths_PA.csv
  2. added_paths_AP.csv
  3. train_df_DenseNet.csv
  4. valid_df_DenseNet.csv
  5. test_df_DenseNet.csv
APmodelling.py and PAmodelling.py
1. To compare DenseNet, ResNet, and MobileNet, we have tested our datasets on a simple CNN model which contained 5 layers, 1000 units, and kernel size of 7. The dropout rate was 40% and used softmax activation function. We have used Adam optimizer. Our CNN model will have 15 outputs. Loss function we used was categorical cross entropy, and we used accuracy metrics. After processing on the CNN, we saved our results on pickle files
2. Relevant files for this script:
  1. train.pkl
  2. val.pkl
  3. test.pkl

three_image_models scripts

scripts that used for three images input models

BaseModelScript.ipynb
1. Load images and get the outputs: X,y creation
2. For both PA and AP
  1. Train, test, validate X,Y sets
  2. DenseNet modeling experiment with LSTM/without LSTM
3. Relevant files for this script:
  1. PA_images.pkl
  2. AP_images.pkl
  3. PA_train.csv
  4. PA_test.csv
  5. PA_val.csv
  6. AP_train.csv
  7. AP_test.csv
  8. AP_val.csv
DenseNetPAModellingFinal.ipynb and DenseNet_AP_Modeling.ipynb
1. DenseNet169 in-depth modeling experiment with LSTM/without LSTM on PA and AP
2. DenseNet169 with LSTM/without LSTM result ROC analysis
3. DenseNet169 with LSTM/without LSTM result Loss analysis
4. DenseNet169 with LSTM/without LSTM result Accuracy analysis
5. Relevant files for this script:
  1. PA_train.csv
  2. PA_test.csv
  3. PA_val.csv
  4. AP_train.csv
  5. AP_test.csv
  6. AP_val.csv
  7. PA_images.pkl
  8. AP_images.pkl
Modeling_MobileNetV2_AP_.ipynb and Modeling_MobileNetV2_PA_.ipynb
1. MobileNetV2 in-depth modeling experiment with LSTM/without LSTM on PA and AP
2. MobileNetV2 with LSTM/without LSTM result ROC analysis
3. MobileNetV2 with LSTM/without LSTM result Loss analysis
4. MobileNetV2 with LSTM/without LSTM result Accuracy analysis
5. Relevant files for this script:
  1. PA_train.csv
  2. PA_test.csv
  3. PA_val.csv
  4. AP_train.csv
  5. AP_test.csv
  6. AP_val.csv
  7. PA_images.pkl
  8. AP_images.pkl
Modeling_ResNetV2_AP_.ipynb and Modeling_ResNetV2_PA_.ipynb
1. ResNet50V2 in-depth modeling experiment with LSTM/without LSTM on PA and AP
2. ResNet50V2 with LSTM/without LSTM result ROC analysis
3. ResNet50V2 with LSTM/without LSTM result Loss analysis
4. ResNet50V2 with LSTM/without LSTM result Accuracy analysis
5. Relevant files for this script:
  1. PA_train.csv
  2. PA_test.csv
  3. PA_val.csv
  4. AP_train.csv
  5. AP_test.csv
  6. AP_val.csv
  7. PA_images.pkl
  8. AP_images.pkl
Loss_Acc_Plots.ipynb
1. a summary version of Loss plots and Acc plots for DenseNet, MobileNetV2, ResNetV2 experiments on the architecture with/without LSTM

Applied Dependencies

Pandas
Numpy
Keras
Tensorflow
OS
CSV
Pickle
tqdm
Sklearn
Collections
PIL
Matplotlib
Seaborn
glob
CV2
Time
Google.colab

File Dependencies

files stored in data_csv_files directory

added_paths_AP.csv contains the corresponding full file path for each AP datapoints' X-ray image on google drive
added_paths_PA.csv contains the corresponding full file path for each PA datapoints' X-ray image on google drive
AP_test.csv contains the test set for AP
PA_test.csv contains the test set for PA
AP_val.csv contains the validation set for AP
PA_val.csv contains the validation set for PA
AP_train.csv contains the training set for AP
PA_train.csv contains the training set for PA

Files present in the Google Drive link for working on the modelling: Google Drive Link: https://drive.google.com/drive/folders/1SezfLewxe0jiSGxc2m1yLnNzMFrwHotQ?usp=sharing

single_image_files: The files required for simulating the single image baseline for modelling for both PA and AP datasets: This directory contains 2 sub-directories:
1. data: PA based datasets presented as pickle files: train.pkl,test.pkl,val.pkl ; AP based datasets presented as pickle files:train_AP.pkl,test_AP.pkl,val_AP.pkl
2. pretrained-models: The pretrained modelled based files used from the Coursera model: pretrained_model.h5, densenet.hdf5
images_3_followup: The files required for simulating the three followup images models for both PA and AP images. The csv are in the data_csv_files directory but the images are saved as a dictionary of the image filename mapped to its image array notation as a 2D array of size (128,128).
1. PA_images.pkl: PA images stored as dictionary mapping the image filename to the image array of size (128,128).
2. AP_images.pkl: AP images stored as dictionary mapping the image filename to the image array of size (128,128).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Chest-X-ray-DL

Project Overall

Brief Script Description and Usage

Preprocessing Scripts

single_image_models scripts

three_image_models scripts

Applied Dependencies

File Dependencies

Files

README.md

Latest commit

History

README.md

File metadata and controls

Chest-X-ray-DL

Project Overall

Brief Script Description and Usage

Preprocessing Scripts

single_image_models scripts

three_image_models scripts

Applied Dependencies

File Dependencies