initial public commit

calico · Mar 17, 2022 · fd3f780 · fd3f780
commit fd3f780
Show file tree

Hide file tree

Showing 25 changed files with 1,126 additions and 0 deletions.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2020 Calico
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/Makefile b/Makefile
@@ -0,0 +1,35 @@
+HOST=127.0.0.1
+TEST_PATH=./tests
+
+PROJECT_NAME=MyProject
+PROJECT_NEW_NAME=MyProject
+FILES_WITH_PROJECT_NAME=Makefile .github/workflows/build-docs.yml README.md docs/index.md mkdocs.yml setup.py 
+FILES_WITH_PROJECT_NAME_LC=Makefile README.md tests/test_myproject.py
+PROJECT_NAME_LC=$(shell echo ${PROJECT_NAME} | tr '[:upper:]' '[:lower:]')
+PROJECT_NEW_NAME_LC=$(shell echo ${PROJECT_NEW_NAME} | tr '[:upper:]' '[:lower:]')
+
+clean-pyc:
+	find . -name '*.pyc' -exec rm --force {} +
+	find . -name '*.pyo' -exec rm --force {} +
+	name '*~' -exec rm --force  {}
+
+clean-build:
+	rm --force --recursive build/
+	rm --force --recursive dist/
+	rm --force --recursive *.egg-info
+
+rename-project:
+	@echo renaming "${PROJECT_NAME}" to "${PROJECT_NEW_NAME}"
+	@sed -i '' -e "s/${PROJECT_NAME}/${PROJECT_NEW_NAME}/g" ${FILES_WITH_PROJECT_NAME}
+	@echo renaming "${PROJECT_NAME_LC}" to "${PROJECT_NEW_NAME_LC}"
+	@sed -i '' -e "s/${PROJECT_NAME_LC}/${PROJECT_NEW_NAME_LC}/g" ${FILES_WITH_PROJECT_NAME_LC} && \
+        git mv src/${PROJECT_NAME_LC} src/${PROJECT_NEW_NAME_LC} && \
+	git mv tests/test_${PROJECT_NAME_LC}.py tests/test_${PROJECT_NEW_NAME_LC}.py
+	@echo Project renamed
+
+lint:
+	flake8 --exclude=.tox
+
+test: 
+	pytest --verbose --color=yes $(TEST_PATH)
+
diff --git a/README.md b/README.md
@@ -0,0 +1,48 @@
+<p>
+    <a href="https://docs.calicolabs.com/python-template"><img alt="docs: Calico Docs" src="https://img.shields.io/badge/docs-Calico%20Docs-28A049.svg"></a>
+    <a href="https://github.com/psf/black"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-000000.svg"></a>
+</p>
+
+# DISH Scoring
+
+![](https://github.com/calico/myproject)
+
+## Overview
+
+This tool applies ML models to the analysis of DEXA images for measuring
+bone changes that are related to diffuse idiopathic skeletal hyperostosis (DISH).
+
+## [DISH Analysis: Methods Description](docs/analysis.md)
+
+## [Developer Documentation](docs/developer.md)
+
+## Installation
+The recommended build environment for the code is to have [Anaconda](https://docs.anaconda.com/anaconda/install/) installed and then to create a conda environment for python 3 as shown below:
+
+```
+conda create -n dish python=3.7
+```
+
+Once created, activate the environment and install all the needed libraries as follows: 
+
+``` 
+conda activate dish
+pip install -r requirements.txt
+```
+
+## Usage 
+An example for a recommended invokation of the code:
+
+```
+python scoreSpines.py -i <dir of imgs> -o <out file> --aug_flip --aug_one
+```
+### [Detailed Usage Instructions](docs/getstarted.md)
+
+
+## License
+
+See LICENSE
+
+## Maintainers
+
+See CODEOWNERS
diff --git a/VERSION b/VERSION
@@ -0,0 +1 @@
+0.0.1
diff --git a/docs/analysis.md b/docs/analysis.md
@@ -0,0 +1,192 @@
+[Back to home.](../README.md)
+
+# DISH Scoring: Methods Description
+
+The pipeline described here scores the extent of hyperostosis that can be observed in a
+lateral dual-energy X-ray absorptiometry (DEXA) scan image of a human torso.  As described
+by [Kuperus et al (2018)](https://www.jrheum.org/content/jrheum/45/8/1116.full.pdf), such
+hyperostosis can create bridges between vertebrae that limit flexibility and ultimately
+contribute to diffuse idiopathic skeletal hyperostosis (DISH). 
+
+The analysis occurs in three steps:
+1. Identification of anterior intervertebral junctions;
+2. Scoring the hyperostosis of each intervertebral junction;
+3. Summing the bridge scores across the spine.
+
+Details on each of those steps are given below, along with the final performance of the
+system against hold-out test data generated by human annotators.
+
+## Step 1: Identification of anterior intervertebral junctions.
+
+Pathological DISH involves the linking of vertebrae by bony outgrowths that traverse
+intervertebral gaps.  Its pathology results from the summed effects of hyperostosis
+between all adjacent pairs of vertebrae in the spine.  The first step on analysis of
+DISH was therefore the identification of the anterior portions of the intervertebral
+gaps along the entire spine.  These are the loci where DISH-relevant bridges can form
+that are visible in lateral DEXA images.  An object-detection model was applied to this
+task. It was trained by transfer learning from the
+**ssd_mobilnet_v1** model, using annotations similar to these below: 
+
+<img src="imgs/objDetTrainExamples.jpeg" alt="examples of bridge score categories" height="100%" width="100%">
+
+A set of 160 images was annotated by this author, which included 2,271 boxes drawn
+around vertebral junctions.  The average number of boxes per image (14.2) is used 
+to define the threshold for junction annotation: for each image being evaluated,
+the 14 highest-confidence annotations returned by the object detector will be used.
+
+The annotated images were separated into training and test sets
+of 100 and 60 images, respectively.  Training-set images were augmented by horizontal
+flipping (all images in the study set are right-facing), inwards adjustment of image borders,
+brightness, and contrast.  In addition, in order to simulate artifacts observed at low frequency
+across the study set, augmentation was performed by drawing large black or white blocks randomly
+along the image edges.  The final augmented training set included 1200 images and 10,244 boxes.
+
+Performance of the object detector was evaluated in the 60-image test set using
+intersection-over-union (IoU) for the 14 top-scoring predicted boxes versus all of the
+annotated boxes, allowing each predicted box's intersection to only be counted for its 
+most-overlapping annotated counterpart.  The average IoU across the 60 test images was
+**68.9% (SD 5.9%)**.
+
+## Step 2: Scoring the hyperostosis of each intervertebral junction.
+
+For each intervertebral junction, a numeric score was to be assigned according to the criteria
+described by [Kuperus et al (2018)](https://www.jrheum.org/content/jrheum/45/8/1116.full.pdf)
+in Figure 2 of that manuscript.  Those authors provide examples and descriptions of hyperostosis
+between adjacent vertebral bodies, scored on a 0-3 scale in terms of both "bridge" and "flow".
+I automated that scoring, with greater attention paid to the "Bridge score" than the 
+"Flow score" scale, using an image classification model.  This model classified images of individual bridges,
+i.e. images extracted from the source image
+using the 14 top-scoring boxes, defined by the object detection model described above.  Four 
+categories were established and named numerically with reference to the bridge score 
+("br0", "br1", "br2", and "br3"), corresponding to the severity 
+of hyperostosis:
+
+<img src="imgs/scoreExamples.jpeg" alt="examples of bridge score categories" height="50%" width="50%">
+
+For the training and testing of this image classification model, the object detection model was
+used to draw boxes (top-scoring 14 per image) across 893 DEXA spine images.  Each of the resulting
+12,502 box images was manually classified as described above.  For the test set, 200 of the DEXA
+images (comprising 2800 bridge images) were randomly selected; the remaining 693 DEXA images (9,702
+bridge images) made up the pre-augmentation training set.  The categories (named "br0", "br1", "br2",
+and "br3", corresponding to the bridge scores) were not evenly balanced (shown for the total annotation set):
+
+| Class | Count | % |
+| ----- | ----: | --: |
+| br0 | 10270 | 82.15 |
+| br1 | 1740 | 13.63 |
+| br2 | 356 | 2.85 |
+| br3 | 172 | 1.38 |
+
+For the training set, the full data set was augmented first using a horizontal flip.  
+In the following augmentation steps, imbalance between the classes was reduced by down-sampling
+from the "br0" and "br1" classes (including in the selection of non-augmented boxes).  For each
+augmentation step, a separate randomly-selected subset of the available boxes (bridge images) was sampled, ensuring
+maximum diversity of images but nonetheless consistent proportions of augmentation treatments across
+the classes.  The use of only 10% of "br0" boxes and 25% of "br1" boxes resulted in the following proportions:
+
+| Class | Input % | Sampled % | Final %
+| ----- | ------: | ------: | ------: |
+| br0   | 82.15 | 10 | 51.8 |
+| br1   | 13.63 | 25 | 21.5 |
+| br2   | 2.85 | 100 | 18.0 |
+| br3   | 1.38 | 100 | 8.7 |
+
+Bridge images were extracted during the augmentation process, allowing the box itself to be randomly
+modified.  The following augmentation combinations were performed: 1) non-augmented; 2) random tilt up to 30 deg.; 
+3) random adjustment of the box edge positions by up to 20% of the box width or height; 4) tilt & edge; 5) tilt &
+brightness; 6) edge & brightness; 7) tilt & contrast; 8) edge & contrast.  Augmentation therefore increased the 
+training set size by 8-fold, resulting in the following counts for bridge images by class:
+
+| Class | Count |
+| ----- | ----: |
+| br0 | 12752 |
+| br1 | 5272 |
+| br2 | 4496 |
+| br3 | 2112 |
+
+Training was performed using transfer learning from the **efficientnet/b1** model.  Evaluated using the
+test set described above, the Cohen's kappa value for the final model was 0.405 with the following 
+confusion matrix (rows=human, cols=model):
+
+|      | br0  | br1 | br2 | br3 | total |
+| ---- | ----:| ---:| ---:| ---:| -----:|
+| **br0** | 2102 | 194 | 31 | 65 | 2300 |
+| **br1** | 195 | 171 | 31 | 40 | 385 |
+| **br2** | 8 | 19 | 29 | 26 | 75 |
+| **br2** | 1 | 6 | 5 | 33 | 40 |
+| **total** | 2306 | 234 | 96 | 164 | |
+
+**Cohen's kappa (test set) = 0.405**
+
+Due to the numeric nature of the classes, the model was also evaluated against the test set using
+Pearson correlation (using the numeric values of each class "br0", "br1", "br2", and "br3"):
+
+**Pearson correlation (test set) = 0.581**
+
+## Step 3: Summing the bridge scores across the spine.
+
+The final output value of the model evaluates overall DISH-like hyperostosis across the spine.  
+Final evaluation
+was performed using a hold-out set of 200 DEXA images that were scored by three independent raters 
+(evaluation was performed using the mean rater score for each DEXA image).
+Those raters used the same bridge-score scheme described above, with the appearance of DISH-related
+bony outgrowth scored as either a 1, 2 or 3 (bridges without observable outgrowth implicitly received
+a score of 0).  For each DEXA image, those numeric scores were summed to produce the final DISH score.
+
+In addition to the final hold-out test used for model evaluation, the independent rater also produced
+a training set of 199 images (**Rater Training**) that were used to compare alternative ML models and 
+alternative strategies for interpretation of the ML model output. The classification model's test set
+annotations were used ensemble across each DEXA image for the same purpose (**Preliminary Training**).  
+In the case of Rater Training, performances of the
+object-detection and classification models were being evaluated simultaneously. In the case of
+Preliminary Training, only the performance of the classification model (and the interpretation of
+its output) were being evaluated.
+
+For each DEXA image, the top-scoring 14 boxes from the object-detection model were used to define
+sub-images that were scored by the classification model, both described above.  Initially, the numbers
+associated with the class assigned to each of the 14 bridge images ("br0", "br1", "br2", "br3") were summed
+to produce the model-derived DISH score.  Two modifications were added to this process, described below.  
+
+First, bridges assigned
+a score of 1 ("br1") were re-evaluated and assigned a decimal score in the interval \[0-1\].  That value
+was calculated as the fraction of confidence scores assigned by the model to classes "br1", "br2", and "br3".
+This had the general effect of down-weighting "br1" assignments, which frequently were made spuriously (see
+the confusion matrix above), unless they looked more like "br2"/"br3" instances (which provide a rare source
+for mis-classification) than they looked like "br0" instances (which provide an abundant source for
+mis-classification).  This modification is referred to below as the "augmentation of one" (**Aug.One**).
+
+Second, the training of both models on horizontally-flipped images, despite the invariance of right-facing
+images in the study set for which this tool was being developed, allowed the implementation of a
+horizontal-flip data augmentation strategy during evaluation.  Each DEXA image was scored twice: once in
+its original orientation, once in its horizontally-flipped orientation.  The output score was taken as the
+average of those two scores.  This allowed the impact of both models' idiosyncrasies to be minimized.
+This modification is referred to below as "**Aug.Flip**".
+
+Pearson correlation coefficients:
+
+| Modification | Prelim. Tr. | Rater Tr. |
+| ------------ | :---------- | :-------- |
+| None         | 0.832 | 0.821 |
+| **Aug.One**  | 0.824 | 0.834 |
+| **Aug.Flip** | 0.839 | 0.838 |
+| **Aug.One + Aug.Flip** | 0.828 | **0.850** |
+
+Use of both **Aug.One** and **Aug.Flip** was the strategy selected for the final application of
+the model.  Here is a plot of performance versus the Rater Training set:
+
+<img src="imgs/trainRegress.jpeg" alt="performance versus Rater Training set" height="45%" width="45%">
+
+## Final performance evaluation.
+
+The Rater Test set provided the basis for the final evaluation of the full DISH scoring tool, 
+as described above, and it was considered after the model
+had been applied to all study images.  Its performance is shown below:
+
+<img src="imgs/testRegress.jpeg" alt="performance versus Rater Training set" height="45%" width="45%">
+
+**Pearson correlation (Rater Test set) = 0.774**
+
+## References
+
+Kuperus et al (2018) "The natural course of diffuse idiopathic skeletal
+hyperostosis in the thoracic spine of adult males." *The Journal of Rheumatology.* 45:1116-1123. doi:10.3899/jrheum.171091
diff --git a/docs/developer.md b/docs/developer.md
@@ -0,0 +1,66 @@
+[Back to home.](../README.md)
+
+# Developer Documentation
+
+## Module Organization
+
+This system is implemented using a data-centric abstraction model.  There 
+is a set of classes responsible for the analysis of data,
+a set of classes responsible for input & output of progress and data,
+and a pair of classes responsible for managing the overall workflow.  The following
+classes are implemented, with the indicated dependencies on one another:
+
+![Module Dependency Diagram](imgs/mdd_modules.jpeg)
+
+### Data analysis:
+
+**DishScorer** executes the scoring of DISH in given DEXA images.  It applies ML models using stored instances of **TfObjectDetector** and **TfClassifier**,
+and it interprets the results of those analyses, executing all augmentation options.
+
+**TfObjectDetector** applies a stored tensorflow object-detection model to an image, returning a series of confidence-scored **Box** instances.
+
+**Box** records the edges off a box on an x-y plane.  Instances can also carry score values for themselves, as well as other arbitrary data ('labels').
+
+**TfClassifier** applies a stored tensorflow image-classification model to an image, returning an instance of **TfClassResult**.
+
+**TfClassResult** records the confidence scores assigned to each of a set of competing classes, as is output by an image-classification ML model.
+
+### I/O Functions:
+
+**ImgLister** defines an interface for reading from a set of images (in this case, DEXA spines) to be 
+analyzed.  Two implementing classes are provided: **ImgDirLister** iterates through all image files 
+in a given directory, while **ImgFileLister** iterates through a set of image files listed in a text
+file (allowing the image files to be distributed across many directories).
+
+**ProgressWriter** defines a listener interface for reporting progress of the DISH scoring tool across a data
+set.  Two implementing classes are provided: **DotWriter** prints dots to the shell as progress is made, while
+**NullDotWriter** does nothing (this allows results printed to stdout to be uncluttered by progress reports).
+
+### Task management:
+
+**ImgDirScorer** executes DISH scoring across a series of DEXA images, defined by its stored **ImgLister** instance.
+
+**PerformanceAnalyzer** executes DISH scoring across a series of images stored in an annotation file (listing a
+score for each image).  Rather than output those scores, it prints the results of a statistical analysis of the
+correlative relationship between the given & determined values.
+
+Additional documentation is found within the code itself.
+
+## Support data files: ML models
+
+In addition to the source code, this pipeline requires two tensorflow 
+saved-model files (`.pb`) and accompanying
+label files.  These represent the ML models that are described in
+the [methods](analysis.md) documentation.  
+
+In the table below, for each ML model, the model & corresponding label file
+are indicated, with a brief description of the model's purpose:
+
+| Model File               | Label File | Purpose       |
+| ------------------------ | ---------- | ------------- |
+| bridgeDetectorModel.pb   | bridgeDetectorLabels.pbtxt | Object detector of anterior side of gaps between adjacent vertebrae. |
+| bridgeScoreModel.pb   | bridgeScoreLabels.txt | Image classification model for identifying the extent of hyperostosis for a given gap between vertebrae. |
+
+## I/O File Formats
+
+See input/output descriptions in the [usage instructions](getstarted.md).