-
Notifications
You must be signed in to change notification settings - Fork 0
Home
We summarize here the steps and code needed to be run for generating a training dataset for skeleton prediction.
SegEM dataset is used as an exemplary dataset. We only utilize the ground truth skeletons from the dataset for training and evaluation. For Harvard users, a simplified processed version of SegEM skeleton data is stored at: /n/pfister_lab2/Lab/vcg_connectomics/EM/segEM/skeletonData/processed/
. A script to visualize the data in Neuroglancer is also provided there.
As described in my masters thesis we encode skeletons into a flux field and train a U-net to predict the flux. Decoding skeletons from flux is achieved by thresholding divergence of the predicted flux field.
Below steps will describe the complete process of training data generation, model training, prediction, and skeleton decoding. It is recommended to use a VM environment with Python 3 installed in it.
1.1 Clone repository: github.com/al093/instance-skeletonization-scripts. It contains different scripts needed for pre and post processing skeleton data.
1.2 Clone the python3 branch of the repository: github.com/donglaiw/ibexHelper. It contains code for segmentation to skeleton generation. Build and install it as described in the Readme.
1.3 Run createSkeletonsTrainData.py. The ## MAIN SCRIPT ## sections contains path of the training and testing volumes of different datasets. For this example use the SegEM paths. Specify a output path of your choice in the variable output_path_base
.
The skeleton file path points to an h5 file which contain the skeleton nodes and edges, in a predefined format. The script interpolates the skeletons using splines and further creates flux field volumes needed for training. It also creates other files for interpolated skeletons, context fields etc. The script also creates skeleton data for multiple scales, for demonstration i suggest you to only generate data for 1.0 scale, by setting scaling_dict = {1.0:'1_0x'}
, also to debug the script it might be useful to avoid parallel processing of the skeletons, so disable that by setting num_proc = 0
1.4 Visualize the context and the flux fields in Neurglancer(NG) using a helper script created by me here. If you dont have NG, install it first: pip install neuroglancer
1.4.1 Run the script with python -i visualize.py
, and visualize the output files by calling appropriate show()
methods, as below:
term:~$ res = [11, 11, 28]
term:~$ im = read_hf('/n/pfister_lab2/Lab/vcg_connectomics/EM/segEM/skeletonData/processed/cortex_training_raw.h5')
term:~$ show_array(im/255.0, 'im-train') # image volume
term:~$ show_skeleton('/n/pfister_lab2/Lab/vcg_connectomics/EM/segEM/skeletonData/processed/cortex_training_skeletons.h5') # ground truth label from segEM
term:~$ output_base_path = 'the/output/path/you/saved/skeleton/data/into'
term:~$ show(output_base_path + 'skeleton_context.h5', 'skel-context') # processed dilated skeletons
term:~$ show_grad_as_color(output_base_path + 'grad_distance_context.h5', 'skel-flux') # skeleton flux - the target training volume.
After we have got the gt flux volumes, we can start training our model. To train for flux, the script trainFlux.py needs to be called with appropriate arguments:
I would suggest to have a 'job' file which can be saved for later reference as below. All the h5 files needed for training are created by createSkeletonsTrainData.py
. At this stage we just need to fill in appropriate values for every argument.
#!/bin/bash
python trainFlux.py -- \
-o <path to save model files, ./output> \
-en <experiment name, my_first_flux_training?> \
-lr <learning rate, 1e-3?> \
--iteration-total <total iterations, 40000?> \
--iteration-save <save iteration per this step, 2000?> \
-g <num gpus to use, 4?> \
-c <num of dataloading threads, 10?> \
-b <batch size, 8?> \
-mi <model input size, maybe 32, 128, 128> \
-ac <model architecture, keep fluxNet> \
--task <keep 4 for flux training> \
--out-channel <keep 3 for flux> \
--in-channel <keep 1 for flux> \
--data-aug <True if you want data augmentation>
-dn <path to segEM image files, seperated by '@' and same for other arguments below>
-fn <path to segEM flux files: grad_distance_context.h5>
-ln <path to segEM flux context files: skeleton_context.h5 >
-sp <path to segEM sampling locations file: context_positions.h5>
-skn <path to segEM skeleton file: skeleton.h5>
To visualize the training curves a Tensorboard server can be started in the output dir.
The script EvaluateFlux.py
can be used to judge the performance of the model at multiple checkpoints and also with different threshold values for skeleton extraction from flux.
To use EvaluateFlux for segEM dataset, I have hardcoded the input image paths and ground truths, but it can be easily extended for other datasets by implementing a getDataset function, which provides all volumes needed for running the inference and calculating ERL(estimated run length) and PR(precision-recall) metrics.
EvaluateFlux.py
expects the following cmd args:
#!/bin/bash
ipython --pdb evaluateFlux.py -- \
--exp-name <Experiment name, e.g. VISOR40_singleScale_noAug_eval> \
-g <num of gpus to use for inference, e.g. 1> \
-c <num of parallel data loaders, e.g. 6> \
-b <input batch size, e.g. 16> \
-mi <network input size, e.g. 64,192,192> \
-ac <model architecture, e.g. fluxNet> \
--task <index of the task for flux its: 4> \
--out-channel 3 \
--in-channel 1 \
--method ours \
--tune True \
--dataset <which dataset to evaluate, e.g. segEM> \
--set <which subset to run eval on, eg. val, possible values: train, val, test> \
--model-dir <Model directory, eg. /n/home11/averma/pytorch_connectomics/outputs/visor40/visor40_singleScale_noAug/, the script runs eval using all checkpoints> \
--div-threshold <threshold for extracting skeletons, eg. 0.60, (between 0 to 1). It is optional, if not provided it runs all hard-coded values>\
Note: If running the same exp again, the script avoids running model inference and the skeleton extraction if the output files are already present in the output paths. To avoid this use --force-run-inference True (for running the model) and --force-run-skeleton True (for skeleton extraction)
The final metric scores are saved in a json
file as a python dict
.
result = {model_checkpoint_1:{threshold_value_1:{pr:score, erl:score ...}, threshold_value_2:{pr:score, erl:score ...}, ....},
model_checkpoint_2:{threshold_value_1:{pr:score, erl:score ...}, ...}