You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 22, 2020. It is now read-only.
After exploring the segmentation code under prediction/src/algorithms/segment/ we have identified a few outstanding issues related to the segmentation functionality and volume calculations. These issues are all interrelated, but we've tried to divide them into two general catagories (whose code paths start in segment/trained_model.py):
Model architecture / complexity (trained_model.predict)
Model output shape (512, 512, 1024) - the .npy mask saved to segment_path - should not have 1024 slices. Most slices after 200 are uniform, for example in LIDC-IDRI-0003 with value 0.45197698 and an overall range around -0.35 to 0.8
The simple_3d_model.py and unet_3d_model.py each use the same best_model_Simple3DModel and make identical predictions. However, the full unet will only process some full size test images without throwing a MemoryError.
It may be too much to try and retrain a new model this late, but it is desireable to have at least one model that accepts any appropriately sized input and outputs the correct shape.
The naive approach using numpy.bincount, which calculates nodule volumes by summing non-zero values in the binary mask saved as lung-mask.npy, does not use centroid information and merely sums non-zero values in the scan, yielding a (poor) total centroid volume rather than the distinct volumes of each centroid in centroids. One negative impact of this is that for n centroids, the predicted volume is just this total volume, n times.
More advanced brute force approaches using convex hull (scipy.spatial.ConvexHull, skimage.morphology.convex_hull_image) are either too memory intensive or only work with 2d arrays. Plus, it's not clear that a standard convex hull approach would be best anyway, since the entire lungs aren't our interest, but subsets of the lungs (perhaps something like skimage.morphology.convex_hull_object, but this only works on 2d arrays).
The ideal function (as specified in the doc string for trained_model.calculate_volume) takes a list of centroids as inputs and calculates e.g. 3d connected components given those centroids.
Note that in the current Simple3DModel, masking of nodules does not perform well and it's possible that there is essentially one large connected component spanning ~200 slices.
The approach to exploring these issues has been to use an interactive jupyter notebook, rooted in the prediection directory of the application. From there, one can use from src.algorithms.segment.trained_model import predict to start playing with the outputs directly and testing changes on the fly. (Pro tip: use the magic %load_ext autoreload to autoreload the functions with your changes everytime you call them.)
And as always, please update documentation too with any new changes for easy points! (The segment predict docs are pretty weak right now.)
The text was updated successfully, but these errors were encountered:
Ah, thanks @vessemer! I thought the functionality was clear to me but I must have been confused due to the fact that labels = [mask[centroid['x'], centroid['y'], centroid['z']] for centroid in centroids] was returning [1 1 1 1 1 1] on the six centroids I was passing it (for LIDC-0003). Didn't realize that scipy.ndimage.label has a default structure parameter representing squared connectivity, which should be sufficient for this stage of the project.
The problem then, seems to be that the image has only one connected component, yes? If so, then 2 in the issue statement above should be good to go for now (in which case I'll edit the issue) and the immediate problems are just those in 1.
@caseyfitz Are you planning to merge the changes you did to the code base in your branch at some point to the master? And by the way: nice notebook! :)
After exploring the segmentation code under
prediction/src/algorithms/segment/
we have identified a few outstanding issues related to the segmentation functionality and volume calculations. These issues are all interrelated, but we've tried to divide them into two general catagories (whose code paths start insegment/trained_model.py
):Model architecture / complexity (
trained_model.predict
).npy
mask saved tosegment_path
- should not have 1024 slices. Most slices after 200 are uniform, for example inLIDC-IDRI-0003
with value0.45197698
and an overall range around-0.35
to0.8
simple_3d_model.py
andunet_3d_model.py
each use the samebest_model_Simple3DModel
and make identical predictions. However, the full unet will only process some full size test images without throwing aMemoryError
.Nodule volume calculation (
trained_model.calculate_volume
)numpy.bincount
, which calculates nodule volumes by summing non-zero values in the binary mask saved aslung-mask.npy
, does not use centroid information and merely sums non-zero values in the scan, yielding a (poor) total centroid volume rather than the distinct volumes of each centroid incentroids
. One negative impact of this is that forn
centroids, the predicted volume is just this total volume,n
times.scipy.spatial.ConvexHull
,skimage.morphology.convex_hull_image
) are either too memory intensive or only work with 2d arrays. Plus, it's not clear that a standard convex hull approach would be best anyway, since the entire lungs aren't our interest, but subsets of the lungs (perhaps something likeskimage.morphology.convex_hull_object
, but this only works on 2d arrays).trained_model.calculate_volume
) takes a list of centroids as inputs and calculates e.g. 3d connected components given those centroids.Simple3DModel
, masking of nodules does not perform well and it's possible that there is essentially one large connected component spanning ~200 slices.The approach to exploring these issues has been to use an interactive jupyter notebook, rooted in the
prediection
directory of the application. From there, one can usefrom src.algorithms.segment.trained_model import predict
to start playing with the outputs directly and testing changes on the fly. (Pro tip: use the magic%load_ext autoreload
to autoreload the functions with your changes everytime you call them.)And as always, please update documentation too with any new changes for easy points! (The segment predict docs are pretty weak right now.)
The text was updated successfully, but these errors were encountered: