diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..3cee7fb
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2020 Calico
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/Makefile b/Makefile
new file mode 100644
index 0000000..e3320de
--- /dev/null
+++ b/Makefile
@@ -0,0 +1,35 @@
+HOST=127.0.0.1
+TEST_PATH=./tests
+
+PROJECT_NAME=MyProject
+PROJECT_NEW_NAME=MyProject
+FILES_WITH_PROJECT_NAME=Makefile .github/workflows/build-docs.yml README.md docs/index.md mkdocs.yml setup.py
+FILES_WITH_PROJECT_NAME_LC=Makefile README.md tests/test_myproject.py
+PROJECT_NAME_LC=$(shell echo ${PROJECT_NAME} | tr '[:upper:]' '[:lower:]')
+PROJECT_NEW_NAME_LC=$(shell echo ${PROJECT_NEW_NAME} | tr '[:upper:]' '[:lower:]')
+
+clean-pyc:
+ find . -name '*.pyc' -exec rm --force {} +
+ find . -name '*.pyo' -exec rm --force {} +
+ name '*~' -exec rm --force {}
+
+clean-build:
+ rm --force --recursive build/
+ rm --force --recursive dist/
+ rm --force --recursive *.egg-info
+
+rename-project:
+ @echo renaming "${PROJECT_NAME}" to "${PROJECT_NEW_NAME}"
+ @sed -i '' -e "s/${PROJECT_NAME}/${PROJECT_NEW_NAME}/g" ${FILES_WITH_PROJECT_NAME}
+ @echo renaming "${PROJECT_NAME_LC}" to "${PROJECT_NEW_NAME_LC}"
+ @sed -i '' -e "s/${PROJECT_NAME_LC}/${PROJECT_NEW_NAME_LC}/g" ${FILES_WITH_PROJECT_NAME_LC} && \
+ git mv src/${PROJECT_NAME_LC} src/${PROJECT_NEW_NAME_LC} && \
+ git mv tests/test_${PROJECT_NAME_LC}.py tests/test_${PROJECT_NEW_NAME_LC}.py
+ @echo Project renamed
+
+lint:
+ flake8 --exclude=.tox
+
+test:
+ pytest --verbose --color=yes $(TEST_PATH)
+
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..524cf4c
--- /dev/null
+++ b/README.md
@@ -0,0 +1,48 @@
+
+
+
+
+
+# DISH Scoring
+
+![](https://github.com/calico/myproject)
+
+## Overview
+
+This tool applies ML models to the analysis of DEXA images for measuring
+bone changes that are related to diffuse idiopathic skeletal hyperostosis (DISH).
+
+## [DISH Analysis: Methods Description](docs/analysis.md)
+
+## [Developer Documentation](docs/developer.md)
+
+## Installation
+The recommended build environment for the code is to have [Anaconda](https://docs.anaconda.com/anaconda/install/) installed and then to create a conda environment for python 3 as shown below:
+
+```
+conda create -n dish python=3.7
+```
+
+Once created, activate the environment and install all the needed libraries as follows:
+
+```
+conda activate dish
+pip install -r requirements.txt
+```
+
+## Usage
+An example for a recommended invokation of the code:
+
+```
+python scoreSpines.py -i -o --aug_flip --aug_one
+```
+### [Detailed Usage Instructions](docs/getstarted.md)
+
+
+## License
+
+See LICENSE
+
+## Maintainers
+
+See CODEOWNERS
diff --git a/VERSION b/VERSION
new file mode 100644
index 0000000..8acdd82
--- /dev/null
+++ b/VERSION
@@ -0,0 +1 @@
+0.0.1
diff --git a/docs/analysis.md b/docs/analysis.md
new file mode 100644
index 0000000..e0d60a9
--- /dev/null
+++ b/docs/analysis.md
@@ -0,0 +1,192 @@
+[Back to home.](../README.md)
+
+# DISH Scoring: Methods Description
+
+The pipeline described here scores the extent of hyperostosis that can be observed in a
+lateral dual-energy X-ray absorptiometry (DEXA) scan image of a human torso. As described
+by [Kuperus et al (2018)](https://www.jrheum.org/content/jrheum/45/8/1116.full.pdf), such
+hyperostosis can create bridges between vertebrae that limit flexibility and ultimately
+contribute to diffuse idiopathic skeletal hyperostosis (DISH).
+
+The analysis occurs in three steps:
+1. Identification of anterior intervertebral junctions;
+2. Scoring the hyperostosis of each intervertebral junction;
+3. Summing the bridge scores across the spine.
+
+Details on each of those steps are given below, along with the final performance of the
+system against hold-out test data generated by human annotators.
+
+## Step 1: Identification of anterior intervertebral junctions.
+
+Pathological DISH involves the linking of vertebrae by bony outgrowths that traverse
+intervertebral gaps. Its pathology results from the summed effects of hyperostosis
+between all adjacent pairs of vertebrae in the spine. The first step on analysis of
+DISH was therefore the identification of the anterior portions of the intervertebral
+gaps along the entire spine. These are the loci where DISH-relevant bridges can form
+that are visible in lateral DEXA images. An object-detection model was applied to this
+task. It was trained by transfer learning from the
+**ssd_mobilnet_v1** model, using annotations similar to these below:
+
+
+
+A set of 160 images was annotated by this author, which included 2,271 boxes drawn
+around vertebral junctions. The average number of boxes per image (14.2) is used
+to define the threshold for junction annotation: for each image being evaluated,
+the 14 highest-confidence annotations returned by the object detector will be used.
+
+The annotated images were separated into training and test sets
+of 100 and 60 images, respectively. Training-set images were augmented by horizontal
+flipping (all images in the study set are right-facing), inwards adjustment of image borders,
+brightness, and contrast. In addition, in order to simulate artifacts observed at low frequency
+across the study set, augmentation was performed by drawing large black or white blocks randomly
+along the image edges. The final augmented training set included 1200 images and 10,244 boxes.
+
+Performance of the object detector was evaluated in the 60-image test set using
+intersection-over-union (IoU) for the 14 top-scoring predicted boxes versus all of the
+annotated boxes, allowing each predicted box's intersection to only be counted for its
+most-overlapping annotated counterpart. The average IoU across the 60 test images was
+**68.9% (SD 5.9%)**.
+
+## Step 2: Scoring the hyperostosis of each intervertebral junction.
+
+For each intervertebral junction, a numeric score was to be assigned according to the criteria
+described by [Kuperus et al (2018)](https://www.jrheum.org/content/jrheum/45/8/1116.full.pdf)
+in Figure 2 of that manuscript. Those authors provide examples and descriptions of hyperostosis
+between adjacent vertebral bodies, scored on a 0-3 scale in terms of both "bridge" and "flow".
+I automated that scoring, with greater attention paid to the "Bridge score" than the
+"Flow score" scale, using an image classification model. This model classified images of individual bridges,
+i.e. images extracted from the source image
+using the 14 top-scoring boxes, defined by the object detection model described above. Four
+categories were established and named numerically with reference to the bridge score
+("br0", "br1", "br2", and "br3"), corresponding to the severity
+of hyperostosis:
+
+
+
+For the training and testing of this image classification model, the object detection model was
+used to draw boxes (top-scoring 14 per image) across 893 DEXA spine images. Each of the resulting
+12,502 box images was manually classified as described above. For the test set, 200 of the DEXA
+images (comprising 2800 bridge images) were randomly selected; the remaining 693 DEXA images (9,702
+bridge images) made up the pre-augmentation training set. The categories (named "br0", "br1", "br2",
+and "br3", corresponding to the bridge scores) were not evenly balanced (shown for the total annotation set):
+
+| Class | Count | % |
+| ----- | ----: | --: |
+| br0 | 10270 | 82.15 |
+| br1 | 1740 | 13.63 |
+| br2 | 356 | 2.85 |
+| br3 | 172 | 1.38 |
+
+For the training set, the full data set was augmented first using a horizontal flip.
+In the following augmentation steps, imbalance between the classes was reduced by down-sampling
+from the "br0" and "br1" classes (including in the selection of non-augmented boxes). For each
+augmentation step, a separate randomly-selected subset of the available boxes (bridge images) was sampled, ensuring
+maximum diversity of images but nonetheless consistent proportions of augmentation treatments across
+the classes. The use of only 10% of "br0" boxes and 25% of "br1" boxes resulted in the following proportions:
+
+| Class | Input % | Sampled % | Final %
+| ----- | ------: | ------: | ------: |
+| br0 | 82.15 | 10 | 51.8 |
+| br1 | 13.63 | 25 | 21.5 |
+| br2 | 2.85 | 100 | 18.0 |
+| br3 | 1.38 | 100 | 8.7 |
+
+Bridge images were extracted during the augmentation process, allowing the box itself to be randomly
+modified. The following augmentation combinations were performed: 1) non-augmented; 2) random tilt up to 30 deg.;
+3) random adjustment of the box edge positions by up to 20% of the box width or height; 4) tilt & edge; 5) tilt &
+brightness; 6) edge & brightness; 7) tilt & contrast; 8) edge & contrast. Augmentation therefore increased the
+training set size by 8-fold, resulting in the following counts for bridge images by class:
+
+| Class | Count |
+| ----- | ----: |
+| br0 | 12752 |
+| br1 | 5272 |
+| br2 | 4496 |
+| br3 | 2112 |
+
+Training was performed using transfer learning from the **efficientnet/b1** model. Evaluated using the
+test set described above, the Cohen's kappa value for the final model was 0.405 with the following
+confusion matrix (rows=human, cols=model):
+
+| | br0 | br1 | br2 | br3 | total |
+| ---- | ----:| ---:| ---:| ---:| -----:|
+| **br0** | 2102 | 194 | 31 | 65 | 2300 |
+| **br1** | 195 | 171 | 31 | 40 | 385 |
+| **br2** | 8 | 19 | 29 | 26 | 75 |
+| **br2** | 1 | 6 | 5 | 33 | 40 |
+| **total** | 2306 | 234 | 96 | 164 | |
+
+**Cohen's kappa (test set) = 0.405**
+
+Due to the numeric nature of the classes, the model was also evaluated against the test set using
+Pearson correlation (using the numeric values of each class "br0", "br1", "br2", and "br3"):
+
+**Pearson correlation (test set) = 0.581**
+
+## Step 3: Summing the bridge scores across the spine.
+
+The final output value of the model evaluates overall DISH-like hyperostosis across the spine.
+Final evaluation
+was performed using a hold-out set of 200 DEXA images that were scored by three independent raters
+(evaluation was performed using the mean rater score for each DEXA image).
+Those raters used the same bridge-score scheme described above, with the appearance of DISH-related
+bony outgrowth scored as either a 1, 2 or 3 (bridges without observable outgrowth implicitly received
+a score of 0). For each DEXA image, those numeric scores were summed to produce the final DISH score.
+
+In addition to the final hold-out test used for model evaluation, the independent rater also produced
+a training set of 199 images (**Rater Training**) that were used to compare alternative ML models and
+alternative strategies for interpretation of the ML model output. The classification model's test set
+annotations were used ensemble across each DEXA image for the same purpose (**Preliminary Training**).
+In the case of Rater Training, performances of the
+object-detection and classification models were being evaluated simultaneously. In the case of
+Preliminary Training, only the performance of the classification model (and the interpretation of
+its output) were being evaluated.
+
+For each DEXA image, the top-scoring 14 boxes from the object-detection model were used to define
+sub-images that were scored by the classification model, both described above. Initially, the numbers
+associated with the class assigned to each of the 14 bridge images ("br0", "br1", "br2", "br3") were summed
+to produce the model-derived DISH score. Two modifications were added to this process, described below.
+
+First, bridges assigned
+a score of 1 ("br1") were re-evaluated and assigned a decimal score in the interval \[0-1\]. That value
+was calculated as the fraction of confidence scores assigned by the model to classes "br1", "br2", and "br3".
+This had the general effect of down-weighting "br1" assignments, which frequently were made spuriously (see
+the confusion matrix above), unless they looked more like "br2"/"br3" instances (which provide a rare source
+for mis-classification) than they looked like "br0" instances (which provide an abundant source for
+mis-classification). This modification is referred to below as the "augmentation of one" (**Aug.One**).
+
+Second, the training of both models on horizontally-flipped images, despite the invariance of right-facing
+images in the study set for which this tool was being developed, allowed the implementation of a
+horizontal-flip data augmentation strategy during evaluation. Each DEXA image was scored twice: once in
+its original orientation, once in its horizontally-flipped orientation. The output score was taken as the
+average of those two scores. This allowed the impact of both models' idiosyncrasies to be minimized.
+This modification is referred to below as "**Aug.Flip**".
+
+Pearson correlation coefficients:
+
+| Modification | Prelim. Tr. | Rater Tr. |
+| ------------ | :---------- | :-------- |
+| None | 0.832 | 0.821 |
+| **Aug.One** | 0.824 | 0.834 |
+| **Aug.Flip** | 0.839 | 0.838 |
+| **Aug.One + Aug.Flip** | 0.828 | **0.850** |
+
+Use of both **Aug.One** and **Aug.Flip** was the strategy selected for the final application of
+the model. Here is a plot of performance versus the Rater Training set:
+
+
+
+## Final performance evaluation.
+
+The Rater Test set provided the basis for the final evaluation of the full DISH scoring tool,
+as described above, and it was considered after the model
+had been applied to all study images. Its performance is shown below:
+
+
+
+**Pearson correlation (Rater Test set) = 0.774**
+
+## References
+
+Kuperus et al (2018) "The natural course of diffuse idiopathic skeletal
+hyperostosis in the thoracic spine of adult males." *The Journal of Rheumatology.* 45:1116-1123. doi:10.3899/jrheum.171091
diff --git a/docs/developer.md b/docs/developer.md
new file mode 100644
index 0000000..62e7e80
--- /dev/null
+++ b/docs/developer.md
@@ -0,0 +1,66 @@
+[Back to home.](../README.md)
+
+# Developer Documentation
+
+## Module Organization
+
+This system is implemented using a data-centric abstraction model. There
+is a set of classes responsible for the analysis of data,
+a set of classes responsible for input & output of progress and data,
+and a pair of classes responsible for managing the overall workflow. The following
+classes are implemented, with the indicated dependencies on one another:
+
+![Module Dependency Diagram](imgs/mdd_modules.jpeg)
+
+### Data analysis:
+
+**DishScorer** executes the scoring of DISH in given DEXA images. It applies ML models using stored instances of **TfObjectDetector** and **TfClassifier**,
+and it interprets the results of those analyses, executing all augmentation options.
+
+**TfObjectDetector** applies a stored tensorflow object-detection model to an image, returning a series of confidence-scored **Box** instances.
+
+**Box** records the edges off a box on an x-y plane. Instances can also carry score values for themselves, as well as other arbitrary data ('labels').
+
+**TfClassifier** applies a stored tensorflow image-classification model to an image, returning an instance of **TfClassResult**.
+
+**TfClassResult** records the confidence scores assigned to each of a set of competing classes, as is output by an image-classification ML model.
+
+### I/O Functions:
+
+**ImgLister** defines an interface for reading from a set of images (in this case, DEXA spines) to be
+analyzed. Two implementing classes are provided: **ImgDirLister** iterates through all image files
+in a given directory, while **ImgFileLister** iterates through a set of image files listed in a text
+file (allowing the image files to be distributed across many directories).
+
+**ProgressWriter** defines a listener interface for reporting progress of the DISH scoring tool across a data
+set. Two implementing classes are provided: **DotWriter** prints dots to the shell as progress is made, while
+**NullDotWriter** does nothing (this allows results printed to stdout to be uncluttered by progress reports).
+
+### Task management:
+
+**ImgDirScorer** executes DISH scoring across a series of DEXA images, defined by its stored **ImgLister** instance.
+
+**PerformanceAnalyzer** executes DISH scoring across a series of images stored in an annotation file (listing a
+score for each image). Rather than output those scores, it prints the results of a statistical analysis of the
+correlative relationship between the given & determined values.
+
+Additional documentation is found within the code itself.
+
+## Support data files: ML models
+
+In addition to the source code, this pipeline requires two tensorflow
+saved-model files (`.pb`) and accompanying
+label files. These represent the ML models that are described in
+the [methods](analysis.md) documentation.
+
+In the table below, for each ML model, the model & corresponding label file
+are indicated, with a brief description of the model's purpose:
+
+| Model File | Label File | Purpose |
+| ------------------------ | ---------- | ------------- |
+| bridgeDetectorModel.pb | bridgeDetectorLabels.pbtxt | Object detector of anterior side of gaps between adjacent vertebrae. |
+| bridgeScoreModel.pb | bridgeScoreLabels.txt | Image classification model for identifying the extent of hyperostosis for a given gap between vertebrae. |
+
+## I/O File Formats
+
+See input/output descriptions in the [usage instructions](getstarted.md).
diff --git a/docs/getstarted.md b/docs/getstarted.md
new file mode 100644
index 0000000..5669e10
--- /dev/null
+++ b/docs/getstarted.md
@@ -0,0 +1,55 @@
+[Back to home.](../README.md)
+
+# Get Started: Running the Analysis
+
+These python scripts use locally-referenced data files that encode the ML
+models that are being applied, so they should be run in-place
+from a clone of this repo. They rely on several python
+dependencies, [described here](developer.md).
+
+## Scoring DEXA images using a directory
+
+The scoring of DISH from either a directory of images or a text file that lists the
+image files to be scored, one per line:
+
+```shell
+python scoreSpines.py -i ${INPUT_DIR} -o ${RESULT_FILE} [--aug_flip/--aug_one]
+```
+
+**Input/output arguments:**
+- `-i INPUT_DIR`: a directory of source images. The file names, minus appends, will be
+ used as the identifiers in the output file. Corrupted or non-image files and
+ sub-directories will be skipped over.
+- `-o RESULT_FILE`: a two-column, tab-delimited file with DEXA photo ID's in the left
+ column and predicted DISH scores in the right. Specifying "stdout" will result in
+ results being written to standard out.
+- `--details DETAILS`: an optional addition to the main calling paradigm (`-i, -o`) that
+ will output a second file containing per-bridge scoring data. For each image, 14 lines
+ of five-column data will be output: image identity (as in `-o` output), bridge instance
+ number, per-bridge DISH score, y-axis position (how far down the image, as a fraction of
+ image height), x-axis position (how far to the right in the image, as a fraction of image
+ width). Specifying "stdout" will result in results being written to standard out.
+
+**Augmentation option arguments:**
+- `--aug_flip`: will flip each image horizontally and repeat the analysis, and output the
+ average of the flipped and non-flipped DISH scores. If `--details` was invoked, those
+ bridges will also be reported (28 total), with position always reported in terms of the
+ original (non-flipped) image.
+- `--aug_one`: downgrades scores of 1 by replacing with the ratio of confidence scores <1 vs >1.
+ Improves probabilistic accuracy for borderline bridges.
+
+
+## Evaluating performance versus pre-scored DEXA images
+
+An easy way to get statistics quickly for performance versus a small set of
+your own annotations:
+
+```shell
+python scoreSpines.py -a ${ANNOT_FILE} [--aug_flip/--aug_one]
+```
+**Invoking argument:**
+- `-a ANNOT_FILE`: an alternative calling method: input a text file with two tab-separated
+ columns (left: file name path; right: annotated DISH score), and this script will
+ apply the DISH scoring algorithm to those images and print a statistical analysis
+ of performance versus the annotations to standard out.
+- Augmentation arguements apply as before.
diff --git a/docs/imgs/index.md b/docs/imgs/index.md
new file mode 100644
index 0000000..748f93f
--- /dev/null
+++ b/docs/imgs/index.md
@@ -0,0 +1 @@
+get rid of this later
diff --git a/docs/imgs/mdd_modules.jpeg b/docs/imgs/mdd_modules.jpeg
new file mode 100644
index 0000000..431dcec
Binary files /dev/null and b/docs/imgs/mdd_modules.jpeg differ
diff --git a/docs/imgs/objDetTrainExamples.jpeg b/docs/imgs/objDetTrainExamples.jpeg
new file mode 100644
index 0000000..2d88d3e
Binary files /dev/null and b/docs/imgs/objDetTrainExamples.jpeg differ
diff --git a/docs/imgs/scoreExamples.jpeg b/docs/imgs/scoreExamples.jpeg
new file mode 100644
index 0000000..1700f87
Binary files /dev/null and b/docs/imgs/scoreExamples.jpeg differ
diff --git a/docs/imgs/testRegress.jpeg b/docs/imgs/testRegress.jpeg
new file mode 100644
index 0000000..464a4ab
Binary files /dev/null and b/docs/imgs/testRegress.jpeg differ
diff --git a/docs/imgs/trainRegress.jpeg b/docs/imgs/trainRegress.jpeg
new file mode 100644
index 0000000..f73c09e
Binary files /dev/null and b/docs/imgs/trainRegress.jpeg differ
diff --git a/docs/index.md b/docs/index.md
new file mode 100644
index 0000000..2ede3c0
--- /dev/null
+++ b/docs/index.md
@@ -0,0 +1,13 @@
+# MyProject
+
+Welcome to MyProject!
+
+## Purpose
+
+
+## Installation
+
+(see README.md)
+
+## Usage
+
diff --git a/mkdocs.yml b/mkdocs.yml
new file mode 100644
index 0000000..c466c0c
--- /dev/null
+++ b/mkdocs.yml
@@ -0,0 +1,4 @@
+site_name: MyProject
+nav:
+ - Index: 'index.md'
+theme: sandstone
diff --git a/models/README.md b/models/README.md
new file mode 100644
index 0000000..3aa0039
--- /dev/null
+++ b/models/README.md
@@ -0,0 +1 @@
+LINK TO DEV PAGE
diff --git a/models/bridgeDetectorLabels.pbtxt b/models/bridgeDetectorLabels.pbtxt
new file mode 100644
index 0000000..69da834
--- /dev/null
+++ b/models/bridgeDetectorLabels.pbtxt
@@ -0,0 +1,4 @@
+item {
+ id: 1
+ name: 'bridge'
+}
diff --git a/models/bridgeDetectorModel.pb b/models/bridgeDetectorModel.pb
new file mode 100644
index 0000000..c27a55e
Binary files /dev/null and b/models/bridgeDetectorModel.pb differ
diff --git a/models/bridgeScoreLabels.txt b/models/bridgeScoreLabels.txt
new file mode 100644
index 0000000..eff2ad6
--- /dev/null
+++ b/models/bridgeScoreLabels.txt
@@ -0,0 +1,4 @@
+br0
+br3
+br1
+br2
diff --git a/models/bridgeScoreModel.pb b/models/bridgeScoreModel.pb
new file mode 100644
index 0000000..563af97
Binary files /dev/null and b/models/bridgeScoreModel.pb differ
diff --git a/package-lock.json b/package-lock.json
new file mode 100644
index 0000000..e1ce6b3
--- /dev/null
+++ b/package-lock.json
@@ -0,0 +1,32 @@
+{
+ "name": "github-template-python-generic",
+ "lockfileVersion": 2,
+ "requires": true,
+ "packages": {
+ "": {
+ "devDependencies": {
+ "prettier": "2.2.1"
+ }
+ },
+ "node_modules/prettier": {
+ "version": "2.2.1",
+ "resolved": "https://registry.npmjs.org/prettier/-/prettier-2.2.1.tgz",
+ "integrity": "sha512-PqyhM2yCjg/oKkFPtTGUojv7gnZAoG80ttl45O6x2Ug/rMJw4wcc9k6aaf2hibP7BGVCCM33gZoGjyvt9mm16Q==",
+ "dev": true,
+ "bin": {
+ "prettier": "bin-prettier.js"
+ },
+ "engines": {
+ "node": ">=10.13.0"
+ }
+ }
+ },
+ "dependencies": {
+ "prettier": {
+ "version": "2.2.1",
+ "resolved": "https://registry.npmjs.org/prettier/-/prettier-2.2.1.tgz",
+ "integrity": "sha512-PqyhM2yCjg/oKkFPtTGUojv7gnZAoG80ttl45O6x2Ug/rMJw4wcc9k6aaf2hibP7BGVCCM33gZoGjyvt9mm16Q==",
+ "dev": true
+ }
+ }
+}
diff --git a/package.json b/package.json
new file mode 100644
index 0000000..0a4c1b2
--- /dev/null
+++ b/package.json
@@ -0,0 +1,5 @@
+{
+ "devDependencies": {
+ "prettier": "2.2.1"
+ }
+}
diff --git a/requirements.txt b/requirements.txt
new file mode 100644
index 0000000..6a46c27
--- /dev/null
+++ b/requirements.txt
@@ -0,0 +1,4 @@
+numpy==1.16.5
+opencv-python==4.1.0.25
+scipy==1.2.1
+tensorflow==1.15.0
diff --git a/scoreSpines.py b/scoreSpines.py
new file mode 100644
index 0000000..66e85eb
--- /dev/null
+++ b/scoreSpines.py
@@ -0,0 +1,605 @@
+import os, cv2, sys
+import tensorflow as tf
+import argparse, os, numpy as np
+import scipy.stats
+import pathlib
+import os
+
+
+# writes the output boxes
+class DishScorer:
+ def __init__(self, objDetMod, classifyMod, numBoxes, minScr):
+ self._boxMod = objDetMod
+ self._classMod = classifyMod
+ self._minScr = minScr
+ self._nBox = numBoxes
+ # augmentations
+ self._aug_spineFlip = False
+ self._aug_adjOne = False
+
+ # functions for modifying behavior with augmentations
+ def addSpineFlip(self):
+ self._aug_spineFlip = True
+
+ def addAdjustOne(self):
+ self._aug_adjOne = True
+
+ def scoreImgDetails(self, img):
+ """also returns the y,x coords and score for each bridge"""
+ detailL = []
+ score = self._scoreImgHelp(img,detailL)
+ return score,detailL
+
+ def scoreImg(self, img):
+ """the public-facing interface for the useful function.
+ It applies all specified augmentations"""
+ # I won't do anything with the individual box details
+ return self._scoreImgHelp(img,[])
+
+ def _scoreImgHelp(self, img, boxDetailL, initialCall=True):
+ """the useful function. "initialCall" allows this function
+ to call itself recursively with an augmented
+ image as input. boxDetailL will be filled with tuples of
+ box positions & scores: (y,x,score)
+ """
+ # for the details
+ imgH,imgW = img.shape[:2]
+ # get the boxes
+ boxL = self._getOkBoxL(img)
+ # if no boxes were found (shouldn't happen),
+ # default null score is zero
+ if len(boxL) == 0:
+ score = 0.0
+ else:
+ # score each box
+ bScoreL = []
+ for b in boxL:
+ # for the details output
+ boxY = (b.yMin() + b.yMax()) / 2.0 / imgH
+ boxX = (b.xMin() + b.xMax()) / 2.0 / imgW
+ # legacy code from when I was augmenting each box
+ # (current behavior: single item in augBoxL
+ augBoxL = self._getAugBoxes(b, img)
+ boxImgL = []
+ for ab in augBoxL:
+ # extract the box sub-image (one vertebral bridge)
+ bImg = img[ab.yMin() : ab.yMax(), ab.xMin() : ab.xMax(), :]
+ boxImgL.append(bImg)
+ # classes are numeric (correspond to amount of DISH-like growth)
+ bClassResL = list(map(self._classMod.getClasses, boxImgL))
+ bScore = np.mean(list(map(self._getScrFromClRes, bClassResL)))
+ bScoreL.append(bScore)
+ boxDetailL.append( (boxY,boxX,bScore) )
+ # the final score is just the sum of all the bridge scores
+ # across the spine
+ score = sum(bScoreL)
+ # creates a flipped version of the image to score, then
+ # averages the two. uses this function recursively,
+ # applying initialCall==False to set recursion limit.
+
+ if initialCall and self._aug_spineFlip:
+ flipImg = np.flip(np.copy(img), 1)
+ flipDetailL = []
+ flipScore = self._scoreImgHelp(flipImg, flipDetailL, False)
+ # I need to flip the details' positions on the x-axis
+ for ypf,xpf,fsc in flipDetailL:
+ boxDetailL.append( (ypf,1.0-xpf,fsc) )
+ score = np.mean([score, flipScore])
+ return score
+
+ def _getAugBoxes(self, box, img):
+ """This function allowed me to do box-specific
+ data augmentation by shiftin the boxes around.
+ That functionality was explored during the
+ development phase but abandoned prior to
+ deployment. But I want to leave this here
+ so that I don't have to re-engineer the rest
+ of the logic (dealing with a box vs list of boxes).
+ """
+ imgH, imgW = img.shape[:2]
+ boxL = [box]
+ return boxL
+
+ def _getScrFromClRes(self, classRes):
+ """converts the bridge class names (strings) to
+ their corresponding numbers.
+ categories are "brN", where N = 0, 1, 2, or 3"""
+ score = int(classRes.best()[-1])
+ # if specified, adjusts the scores of 1 to a decimal
+ # value in the range of [0-1], depending on the distribution
+ # of zero versus non-zero scores. this hueristic reduced
+ # the contribution of noise from 1-scoring bridges (IMO the
+ # hardest to classify versus 0, and that was also reflected
+ # in classification model performance).
+ if self._aug_adjOne and score == 1:
+ pLess = classRes.score("br0")
+ pMore = np.mean(list(map(classRes.score, ["br1", "br2", "br3"])))
+ if pMore + pLess > 0:
+ score = pMore / (pLess + pMore)
+ return score
+
+ def _getOkBoxL(self, img):
+ """calls the obj-detect model and gets the most-
+ confidently-identified boxes, using the instance-
+ defined max number of boxes.
+ """
+ boxL = self._boxMod.getBoxes(img)
+ boxL = list(filter(lambda b: b.score() >= self._minScr, boxL))
+ if len(boxL) > self._nBox:
+ # the n is the tiebreaker
+ boxL = [(boxL[n].score(), n, boxL[n]) for n in range(len(boxL))]
+ boxL.sort(reverse=True)
+ boxL = boxL[: self._nBox]
+ boxL = [b for (s, n, b) in boxL]
+ return boxL
+
+
+class ImageDirLister:
+ """allows iteration through images in a
+ directory, using the ImageLister interface
+ """
+
+ def __init__(self, hostDir, append=".png"):
+ # check that the host dir exists
+ if not (os.path.isdir(hostDir)):
+ raise ValueError("host dir doesn't exist")
+ self._hostD = os.path.abspath(hostDir)
+ self._append = append
+
+ def getImgFiles(self):
+ imgFL = os.listdir(self._hostD)
+ imgFL.sort()
+ aLen = len(self._append)
+ imgFL = list(filter(lambda i: i[-aLen:] == self._append, imgFL))
+ imgFL = list(map(lambda i: os.path.join(self._hostD, i), imgFL))
+ return imgFL
+
+
+class ImageFileLister:
+ """allows iteration through images files that are
+ listed in a text file, using the ImageLister interface
+ """
+
+ def __init__(self, fileOfFiles):
+ # check that the host dir exists
+ if not (os.path.isfile(fileOfFiles)):
+ raise ValueError("file-of-files doesn't exist")
+ self._fofName = fileOfFiles
+
+ def getImgFiles(self):
+ with open(self._fofName) as f:
+ imgFL = f.readlines()
+ imgFL = list(map(lambda i: i.rstrip(), imgFL))
+ return imgFL
+
+
+class ImageDirScorer:
+ """applies scores to a set of images and records
+ the results to a file (or stdout)
+ """
+
+ def __init__(self, scorer, fileLister):
+ self._scorer = scorer
+ self._fileLister = fileLister
+
+ def scoreImages(self, outfileName, outfileDetails=""):
+ """outfileName is for the standard scores.
+ outfileDetails is for extra by-box details"""
+ imgFL = self._fileLister.getImgFiles()
+ print("Analyzing " + str(len(imgFL)) + " images.")
+ # if I'm writing to stdout, the output will be
+ # the progress marker, so no need for dots
+ if outfileName == "stdout":
+ outf = sys.stdout
+ outfDetails = sys.stdout
+ progress = NullDotWriter()
+ else:
+ outf = open(outfileName, "w")
+ progress = DotWriter(5, 50, 250)
+ writeDetails = False
+ if outfileDetails:
+ writeDetails = True
+ if outfileDetails == "stdout":
+ outfDetails = sys.stdout
+ else:
+ outfDetails = open(outfileDetails,'w')
+ count = 0
+ for imgF in imgFL:
+ progress.tick()
+ if not (os.path.isfile(imgF)):
+ raise ValueError("Image file not found: " + imgF)
+ if len(imgF.split(".")) < 2:
+ aLen = 0
+ else:
+ aLen = len(imgF.split(".")[-1]) + 1
+ imgName = os.path.basename(imgF)[:-aLen]
+ img = cv2.imread(imgF)
+ if not(writeDetails):
+ score = self._scorer.scoreImg(img)
+ else:
+ score,detailL = self._scorer.scoreImgDetails(img)
+ outf.write(imgName + "\t" + str(score) + "\n")
+ outf.flush()
+ if writeDetails:
+ detailL.sort()
+ for n in range(len(detailL)):
+ ypos,xpos,scr = detailL[n]
+ deetStr = '\t'.join(list(map(str,[imgName,n+1,scr,ypos,xpos])))
+ outfDetails.write(deetStr + "\n")
+ outfDetails.flush()
+ if outf != sys.stdout:
+ outf.close()
+ if writeDetails and outfDetails != sys.stdout:
+ outfDetails.close()
+
+
+class DotWriter:
+ """progress tracker for UI that prints a dot
+ after scoring the specified number of images,
+ with intermediate bar markings & dot-per-line option
+ """
+
+ def __init__(self, perDot, perBar, perLine):
+ self._pDot = perDot
+ self._pBar = perBar
+ self._pLine = perLine
+ self._count = 0
+
+ def tick(self):
+ self._count += 1
+ if self._count % self._pBar == 0:
+ sys.stdout.write("|")
+ elif self._count % self._pDot == 0:
+ sys.stdout.write(".")
+ if self._count % self._pLine == 0:
+ sys.stdout.write("\n")
+ sys.stdout.flush()
+
+
+class NullDotWriter:
+ """null progress tracker for UI"""
+
+ def __init__(self):
+ pass
+
+ def tick(self):
+ pass
+
+
+class PerformanceAnalyzer:
+ """a special class to run analysis on pre-annotated
+ images and perform statistics on new vs old results.
+ implemented within this script to accelerate model
+ development & prototype scoring functions
+ """
+
+ def __init__(self, annotFile):
+ self._imgfToScore = {}
+ with open(annotFile) as f:
+ line = f.readline()
+ while line:
+ if line[0] != ">":
+ imgF = line.strip()
+ self._imgfToScore[imgF] = 0.0
+ else:
+ if len(line) == 1:
+ cols = [""]
+ cols = line[1:].rstrip().split("\t")
+ # the categories will be in the last column, either "brN" or "Br_N"
+ # where N is 0, 1, 2, or 3;
+ # "FV_" provides the option to give a float value
+ if cols[-1].find("FV_") == 0:
+ scr = float(cols[-1].split("_")[1])
+ else:
+ scr = int(cols[-1][-1])
+ self._imgfToScore[imgF] += scr
+ line = f.readline()
+
+ def scoreImages(self, scorer):
+ print("Analyzing " + str(len(self._imgfToScore)) + " images.")
+ annotL, modelL = [], []
+ progress = DotWriter(5, 50, 250)
+ for imgF in self._imgfToScore.keys():
+ if not (os.path.isfile(imgF)):
+ raise ValueError("Image file not found: " + imgF)
+ progress.tick()
+ annotL.append(self._imgfToScore[imgF])
+ img = cv2.imread(imgF)
+ score = scorer.scoreImg(img)
+ modelL.append(score)
+ sys.stdout.write("\n")
+ print(str(scipy.stats.linregress(annotL, modelL)))
+
+
+class TfClassifier:
+ """applies the specified TF image-classification model"""
+
+ def __init__(self, existingModelFile, categoryFile):
+ self._modFile = existingModelFile
+ self._catFile = categoryFile
+ proto_as_ascii_lines = tf.gfile.GFile(categoryFile).readlines()
+ self._labels = list(map(lambda i: i.rstrip(), proto_as_ascii_lines))
+ # ## Load a (frozen) Tensorflow model into memory.
+ self._detection_graph = tf.Graph()
+ with self._detection_graph.as_default():
+ od_graph_def = tf.GraphDef()
+ with tf.gfile.GFile(self._modFile, "rb") as fid:
+ serialized_graph = fid.read()
+ print(self._modFile)
+ od_graph_def.ParseFromString(serialized_graph)
+ tf.import_graph_def(od_graph_def, name="")
+ self._sess = tf.Session(graph=self._detection_graph)
+
+ def getClasses(self, image, spCl=None):
+ # get the image tensor so I can re-size the image appropriately
+ image_tensor = self._detection_graph.get_tensor_by_name("Placeholder:0")
+ h, w = image.shape[:2]
+ if h * w == 0:
+ image = np.zeros(image_tensor.shape[1:])
+ image_resized = cv2.resize(image, dsize=tuple(image_tensor.shape[1:3]))
+ image_np_expanded = np.expand_dims(image_resized, axis=0)
+ image_np_expanded = image_np_expanded.astype(np.float32)
+ image_np_expanded /= 255
+ answer_tensor = self._detection_graph.get_tensor_by_name("final_result:0")
+ # Actual detection.
+ (answer_tensor) = self._sess.run(
+ [answer_tensor], feed_dict={image_tensor: image_np_expanded}
+ )
+ results = np.squeeze(answer_tensor)
+ results = [(results[n], self._labels[n]) for n in range(len(self._labels))]
+ return TfClassResult(results)
+
+ def labels(self):
+ return self._labels
+
+
+class TfClassResult:
+ """wraps a classification result
+ into a convenient interface
+ """
+
+ # results: a list of score,label tuples
+ def __init__(self, results):
+ self._rD = {}
+ for s, lb in results:
+ self._rD[lb] = s
+ self._lbmx = max(results)[1]
+
+ def best(self):
+ return self._lbmx
+
+ def score(self, lb):
+ return self._rD[lb]
+
+ def labels(self):
+ return self._rD.keys()
+
+
+# separate out the box-drawing
+class TfObjectDetector:
+ """applies the specified TF object-detection model to images"""
+
+ def __init__(self, existingModelFile, categoryFile):
+ self._modFile = existingModelFile
+ self._catFile = categoryFile
+ # this graph
+ self._detection_graph = tf.Graph()
+ with self._detection_graph.as_default():
+ od_graph_def = tf.GraphDef()
+ with tf.gfile.GFile(self._modFile, "rb") as fid:
+ serialized_graph = fid.read()
+ print(self._modFile)
+ od_graph_def.ParseFromString(serialized_graph)
+ tf.import_graph_def(od_graph_def, name="")
+ with open(self._catFile) as f:
+ catText = f.read()
+ self._category_index = {}
+ for entry in catText.split("item {")[1:]:
+ idNum = int(entry.split("id:")[1].split("\n")[0].strip())
+ idName = entry.split("name:")[1].split("\n")[0].strip()[1:-1]
+ self._category_index[idNum] = {"id": idNum, "name": idName}
+ self._sess = tf.Session(graph=self._detection_graph)
+ # for my own convenience
+ self._numToName = {}
+ for d in self._category_index.values():
+ self._numToName[d["id"]] = d["name"]
+
+ def getClassIds(self):
+ outD = {}
+ for d in self._category_index.values():
+ outD[d["name"]] = d["id"]
+ return outD
+
+ def getBoxes(self, image):
+ # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
+ image_np_expanded = np.expand_dims(image, axis=0)
+ image_tensor = self._detection_graph.get_tensor_by_name("image_tensor:0")
+ # Each box represents a part of the image where a particular object was detected.
+ boxes = self._detection_graph.get_tensor_by_name("detection_boxes:0")
+ # Each score represent how level of confidence for each of the objects.
+ # Score is shown on the result image, together with the class label.
+ scores = self._detection_graph.get_tensor_by_name("detection_scores:0")
+ classes = self._detection_graph.get_tensor_by_name("detection_classes:0")
+ num_detections = self._detection_graph.get_tensor_by_name("num_detections:0")
+ # Actual detection.
+ (boxes, scores, classes, num_detections) = self._sess.run(
+ [boxes, scores, classes, num_detections],
+ feed_dict={image_tensor: image_np_expanded},
+ )
+ h, w, ch = image.shape
+ bL, scL, numB = boxes[0], scores[0], num_detections[0]
+ classL = classes[0]
+ boxL = []
+ for n in range(int(numB)):
+ yA, yB = int(bL[n][0] * h), int(bL[n][2] * h)
+ xA, xB = int(bL[n][1] * w), int(bL[n][3] * w)
+ clName = self._numToName[classL[n]]
+ boxL.append(Box(xA, yA, xB, yB, scL[n], clName))
+ return boxL
+
+
+class Box:
+ """A box, defined by two corners at points (x0,y0) and
+ (x1,y1) on the plane of the DEXA image. Units for
+ point positions are 0-indexed pixels. Score is the
+ confidence score given to that box by the Object
+ Detector model. clName gives the label for the
+ detected object's class. Not used here since there
+ is only one class of objects being detected, but
+ useful for debugging so I kept it around.
+ """
+
+ def __init__(self, x0, y0, x1, y1, score, clName):
+ self._x0, self._y0 = x0, y0
+ self._x1, self._y1 = x1, y1
+ self._score = score
+ self._clName = clName
+
+ # recover coords with min/max values
+ def xMin(self):
+ return min([self._x0, self._x1])
+
+ def yMin(self):
+ return min([self._y0, self._y1])
+
+ def xMax(self):
+ return max([self._x0, self._x1])
+
+ def yMax(self):
+ return max([self._y0, self._y1])
+
+ def score(self):
+ return self._score
+
+ def name(self):
+ return self._clName
+
+ def exists(self):
+ return self._x0 != self._x1 and self._y0 != self._y1
+
+ # to allow for modifications
+ def copy(self):
+ return Box(self._x0, self._y0, self._x1, self._y1, self._score, self._clName)
+
+ def translate(self, xTrans, yTrans):
+ """slides the box along each axis (pixels)"""
+ self._x0, self._x1 = self._x0 + xTrans, self._x1 + xTrans
+ self._y0, self._y1 = self._y0 + yTrans, self._y1 + yTrans
+
+ def constrain(self, imgW, imgH):
+ """limits the box to the confines of the image"""
+ if self.xMin() < 0:
+ if self.xMax() < 0:
+ self._x0, self._x1 = 0, 0
+ else:
+ self._x0, self._x1 = 0, self.xMax()
+ if self.yMin() < 0:
+ if self.yMax() < 0:
+ self._y0, self._y1 = 0, 0
+ else:
+ self._y0, self._y1 = 0, self.yMax()
+ if self.xMax() > imgW:
+ if self.xMin() > imgW:
+ self._x0, self._x1 = imgW, imgW
+ else:
+ self._x0, self._x1 = self.xMin(), imgW
+ if self.yMax() > imgH:
+ if self.yMin() > imgH:
+ self._y0, self._y1 = imgH, imgH
+ else:
+ self._y0, self._y1 = self.yMin(), imgH
+
+
+# constants defining source files and application
+# variables for the ML models
+WORKDIR = os.path.abspath(os.getcwd())
+
+BOX_MODEL_FILE = WORKDIR + "/models/bridgeDetectorModel.pb"
+BOX_MODEL_LABEL = WORKDIR + "/models/bridgeDetectorLabels.pbtxt"
+BOX_NUMBER = "14"
+BOX_NUMBER = "14"
+BOX_MIN_SCORE = "0"
+CLASS_MODEL_FILE = WORKDIR + "/models/bridgeScoreModel.pb"
+CLASS_MODEL_LABEL = WORKDIR + "/models/bridgeScoreLabels.txt"
+
+
+def main():
+ # start the app
+ ap = argparse.ArgumentParser()
+ ap.add_argument(
+ "-i",
+ "--input_dir",
+ help="input directory of images to be scored (or .txt file listing images)",
+ )
+ ap.add_argument("-o", "--output_file", help='output file of box locations ("stdout" is an option)')
+ ap.add_argument(
+ "-a",
+ "--annot_file",
+ help="a file of annotated images for performance comparison",
+ )
+ # data augmentation
+ ap.add_argument(
+ "--aug_flip",
+ help="score each image twice, with a horizontal flip",
+ action="store_true",
+ )
+ ap.add_argument(
+ "--aug_one",
+ help="downgrades scores of 1 by replacing with the ratio of scores <1 vs >1",
+ action="store_true",
+ )
+ # extra output
+ ap.add_argument(
+ "--details",
+ help="""an extra output file with bridge-by-bridge scoring details
+ ("stdout" is an option), cols are [ID,n,score,ypos,xpos], where
+ positions are fractions of the image height/width.""",
+ default = "",
+ )
+ args = vars(ap.parse_args())
+
+ # set things up
+ boxMod = TfObjectDetector(BOX_MODEL_FILE, BOX_MODEL_LABEL)
+ classMod = TfClassifier(CLASS_MODEL_FILE, CLASS_MODEL_LABEL)
+ minScr = float(BOX_MIN_SCORE)
+ numBoxes = int(BOX_NUMBER)
+ scorer = DishScorer(boxMod, classMod, numBoxes, minScr)
+ if args["aug_flip"]:
+ scorer.addSpineFlip()
+ if args["aug_one"]:
+ scorer.addAdjustOne()
+
+ # score new images
+ if args["input_dir"]:
+ if os.path.isdir(args["input_dir"]):
+ imgLister = ImageDirLister(args["input_dir"])
+ elif os.path.isfile(args["input_dir"]):
+ imgLister = ImageFileLister(args["input_dir"])
+ else:
+ raise ValueError("input is nether a directory nor a file")
+ imgMang = ImageDirScorer(scorer, imgLister)
+ if args["output_file"]:
+ outfName = args["output_file"]
+ else:
+ outfName = "stdout"
+
+ writeDetails = False
+ if args["details"]:
+ writeDetails = True
+ outfDetailsName = args["details"]
+
+ if writeDetails:
+ imgMang.scoreImages(outfName,outfileDetails=outfDetailsName)
+ else:
+ imgMang.scoreImages(outfName)
+
+ # stats on pre-annotated images; ability to do both allows
+ # a quick performance check to be added to every run of the
+ # script, if you want for QC
+ if args["annot_file"]:
+ perfMang = PerformanceAnalyzer(args["annot_file"])
+ perfMang.scoreImages(scorer)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/setup.py b/setup.py
new file mode 100644
index 0000000..15e25b2
--- /dev/null
+++ b/setup.py
@@ -0,0 +1,35 @@
+import sys
+import os
+from pathlib import Path
+from setuptools import setup, find_packages
+from glob import glob
+
+if sys.version_info < (3, 6):
+ sys.exit("This software requires Python >= 3.6")
+
+
+def get_requirements(reqs_file: str):
+ return [l.strip() for l in Path(reqs_file).read_text("utf-8").splitlines()]
+
+
+setup(
+ name="MyProject",
+ version="0.1",
+ description="MyProject description",
+ long_description=Path("README.md").read_text("utf-8"),
+ long_description_content_type="text/markdown",
+ url="https://github.com/calico/MyProject",
+ author="MyProjectAuthor",
+ author_email="MyProjectAuthor@calicolabs.com",
+ python_requires=">=3.6",
+ install_requires=get_requirements("requirements.txt"),
+ packages=find_packages("src"),
+ package_dir={"": "src"},
+ py_modules=[
+ os.path.splitext(os.path.basename(path))[0] for path in glob("src/*.py")
+ ],
+ classifiers=[
+ "Environment :: Console",
+ "Intended Audience :: Science/Research",
+ ],
+)