Description

This is my implementation of a pneumonia detection model for X-ray scans of the lungs. It is based on Kaggle's RSNA Pneumonia Detection Challenge competition. My implementation takes inspiration from the discussions among the winners of the competition here.

Details

To start with the PyTorch implementation of the Retinanet detection model was used.
The following major modifications were made to the code:
1. Training Data Transformations: The original code had only the random horizontal flip transformation for the input images. Transformations like rotation, translation, scaling, and shear were added.
2. Ensemble Model: The code was modified to train an ensemble of 3 models using scikit-learn's StratifiedKFold
3. Regularization: To mitigate overfitting in the detection, dropout was added in the classification part of the network.
Training details:
1. Batchsize: 128
2. Image size: 256
3. Number of epochs: 13
4. Optimiser: Adam, LR 1e-5
Inference details:
1. Classification score threshold (to decide whether an anchor box is a detection): 0.06
2. Non-maximum suppression (NMS) IOU threshold: 0.1
3. For ensembling, an IoU threshold of 0.1 was used at which boxes detected by different models were merged (by taking a weighted average of their coordinates). For bounding boxes that did not overlap between the models, a classification threshold value of 0.1 was used to decide whether the solitary box should be retained.
4. For reasons explained further below, the true detections in the test images were systematically smaller than the ones in the training images. So after computing the detection boxes from the ensemble, every box was resized to 87.5% of its size.

Training

Create csv annotation file. Use the command:

python create_trainingcsv.py
Create the file that will contain different 'folds' for training the ensemble of models.

python create_folds.py --csv_file annots_file.csv --csv_classes class_list.csv
Start training with command:

python train.py --csv_file annots_file.csv --csv_classes class_list.csv --epochs 13 --batchsize 128 --fold_no 1 --folds_path folds_train.npy

The trained models will be saved in the current directory.

Prediction

Create CSV annotation file. Use the command:

python create_inferencecsv.py
For an ensemble of models, start inference with command:

python csv_prediction_ensemble.py --csv_classes class_list.csv --csv_test test_annots.csv --model_epoch 13 --num_images_topredict 3000 --score_threshold 0.06

If predictions of only one model are required then use the command:

python csv_prediction.py --csv_classes class_list.csv --csv_test test_annots.csv --model <model_file.pt> --num_images_topredict 3000 --score_threshold 0.06

The CSV file containing the predictions will be created in the current directory.

Visualise

You can visualize the model's output (saved in test_predictions.csv) by running:

python visualize.py --csv_predictions test_predictions.csv --images_path /path/to/test/images

Results

Kaggle provided a test set of 3000 images with the correct outputs hidden. Based on the trained model's submitted predictions on the test images, Kaggle computes the Average Precision (AP), averaged for IOU thresholds ranging from 0.4 to 0.75. The Mean Average Precision (MAP) for our final ensemble model was ~0.17.

This score is not particularly high but neither are the scores of the top scorers of the competition. A major issue is that the distribution of positive/negative images is different in the training set and the test set. In general, the statistics of the 2 datasets seem to be different (see Inconsistencies between train dataset and test dataset). This is possibly because the method used for manually annotating the images is different for the 2 datasets as explained here. This essentially led to the annotations in the test dataset being systematically smaller and more in frequency as compared to the training set. When there is such a difference between the training and test dataset, a model trained well might not perform well on the test data and one is left to do only trial-and-error with different parameters in hopes of getting better test scores.

Ideas for improvement

To allow the detection of smaller boxes, we can include level 2 of the Feature Pyramid Network (FPN) in the calculation of classification and regression values for each anchor box at each level.
A better method of ensembling might be to take the classification and regression scores of all anchor boxes of all the models in the ensemble, take the mean of each score across the models, and then use the detection threshold followed by NMS.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Details

Training

Prediction

Visualise

Results

Ideas for improvement

Sample Images

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
__pycache__		__pycache__
images		images
retinanet		retinanet
README.md		README.md
annots_file.csv		annots_file.csv
class_list.csv		class_list.csv
create_folds.py		create_folds.py
create_inferencecsv.py		create_inferencecsv.py
create_trainingcsv.py		create_trainingcsv.py
csv_prediction.py		csv_prediction.py
csv_prediction_ensemble.py		csv_prediction_ensemble.py
csv_validation.py		csv_validation.py
ensemble.py		ensemble.py
folds_train.npy		folds_train.npy
folds_val.npy		folds_val.npy
test_annots.csv		test_annots.csv
test_predictions.csv		test_predictions.csv
train.py		train.py
visualize.py		visualize.py
visualize_old.py		visualize_old.py

garvitagarwal290/RSNA_Pneumonia_Detection_Kaggle

Folders and files

Latest commit

History

Repository files navigation

Description

Details

Training

Prediction

Visualise

Results

Ideas for improvement

Sample Images

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages