We designed an IRN_m network for multi-objects ratio tasks, and an IRN_p network for pair-ratio estimation tasks.
if you meet any problem or bug, please tell me ([email protected]) thanks (^,^).
-
Recommended Env. Python3.6, Tensorflow1.14.0, Keras 2.2.4.
-
Neccessary libaries: numpy, opencv-python, argparse, scikit-learn, openpyxl, pickle, matplotlib.
For quick experiments, we provide Code_PureExperiments to automatically run the network 5 times by default and compute the average and SD of MSE and MLAE. In PureExperiments, dataset will automatically re-generated every time.
- I'll take Task1.1 Pie3_6 (codes) as an example to show how to use the codes.
python Net_VGG.py --gpu 0 (--times 5)
python Net_RN.py --gpu 1 (--times 5)
python Net_RN_seg.py --gpu 2 (--times 5)
python Net_IRNm.py --gpu 3 (--times 5) # or `python Net_IRNp.py --gpu 3 (--times 5)` for pair tasks.
- if you need to check some
sample images of dataset
in local directory, you could run the script as follows to generate a small dataset in current path './datasets/'. This small dataset contains 600/200/200 training/val/test images, including original image, segmented subimages and ground-truths.
python Dataset_generator.py
For most tasks, we store the best model
that obtains the lowest loss on val_sets
(Exactly speaking, it's the condition that train_set and test_set have same features.). Here is an example output of an RN network.
-
RN_0.p ~ RN_4.p
: the pickle files corresponding to each experiment. It contains theMSE, MLAE
of train/val/test sets and theloss history
of train/val sets. -
RN_avg.p
: the pickle files that summarize all experiments. It contains theaverage
andSD
of MSE and MLAE of train/test sets. -
folders RN_0 ~ RN_4
: it contains themodel & weight
of the trained network, and thepredicted results & ground-truth
. -
Note I also compute the
ERROR RATE
in POINT CLOUOD tasks which could be obtained in its pickle files.
However, we also have some generalization tasks whose training and test set have different features, e.g, PieNumber
, PieLineWidth
and PieColor
. In these conditions, we not only store the best results
that obtains the lowest loss on val_set
, but also store the best results
on train_set
. That's because VGG
and RN
can only fit the train_set well, but cannot fit the val_set and test_set. If we observe the loss curve of VGG in PieColor_fixedTrain
or RN in PieLineWidth
tasks, we could find it's possible that the val_loss have got the lowest value whereas the train_loss still didn't converge. That's because the network optimized only on train_set, and the val_set is totally different from train_set. So the val_loss don't have any relation with train_loss. Thererfore, I store both two results to show the best performance that the network can achieve on train_set and val_set respectively.
The following figure shows an example file VGG_0.p
in PieLineWidth task.
To make the generalization ability of our network more powerful, we redesgin the IRN_m network, as shown in the following figure. It makes great improvements on the conditions that (1) the training and testing set are different, e.g., Task1.1 PieNumber
, or (2) the object number is large, for example task1.3 Pie3_12
.
Details:
(1) Train/val/test sets contains 60000/20000/20000 charts respectively. And We use Adam optimizer (lr=0.0001) to train the network.
(2) During training, we first shuffle the datasets before each epoch, and save the best model
which can get the lowest MSE loss on validation set
.
(3) Noises were directly added during dataset generation. And Position-length
and Point cloud
got the most different values from the obvious results in Daniel's paper when I use Adam optimizer.
(These experiments focus on verifying the generalization ability of networks.)
Note that: For most tasks, we use the best model on validation set to compute its final MSE and MLAE etc. However, VGG, VGG_seg and RN don't have strong generalization abilibity so that they can not deal with the validation/testing sets in PieNumber
and PieLineWidth
. It may happen that the network has obtained the lowest loss on validation sets but it still doesn't converage on training set. Therefor, to evaluate them better, only for VGG and RN in PieNumber and PieLineWidth tasks, we use the best model on training set instead of on validation set to compute the MSE on training set, while we still use the best model on validation set to compute its MSE on testing sets.
[Pie3_6 Codes] [Pie3_12 Codes]
- This task is to test the performance when the maximun object number is large and the number changes greatly. The object number in both training and testing sets is 3 to 6 (12) for task Pie3_6(Pie3_12). I think 12 is large enough since if we use a larger number, the chart would looks messy.
- It's clear that VGG and RN performs worse when the number of objects increased, as the following results proved. Whereas, our network, IRN_m can still perform very nice.
It seems that the MLAE of VGG increases more significantly than MSE.
MSE(MLAE) | VGG | RN | IRN_p | IRN_m (!!!) |
---|---|---|---|---|
Pie3_6: Train set | 0.00036(0.67) | 0.00435(2.26) | 0.00016(-0.25) | 0.00012(-0.29) |
Pie3_6: Test set | 0.00038(0.70) | 0.00438(2.26) | 0.00017(-0.22) | 0.00012(-0.28) |
Pie3_12: Train set | 0.00089(1.24) | 0.00705(2.54) | 0.00033(0.12) | 0.00023(0.00) |
Pie3_12: Test set | 0.00098(1.29) | 0.00727(2.56) | 0.00041(0.25) | 0.00024(0.02) |
[FixedTrain Codes] [RandomColor Codes]
- We design two tasks in PieColor. (1) FixedTrain: the training set only uses 6 colors, while testing set use random colors. (2) RandomColor: Both training and testing set use random colors.
FixedTrain | VGG | RN | IRN_m (!!!) |
---|---|---|---|
Train set | 0.00040(0.76) | 0.00443(2.28) | 0.00014(-0.24) |
Test set | 0.06982(3.90) | 0.08715(4.38) | 0.00480(1.45) |
RandomColor | VGG | RN | IRN_m (!!!) |
---|---|---|---|
Train set | 0.00051(0.86) | 0.00492(2.36) | 0.00015(-0.22) |
Test set | 0.00095(0.95) | 0.00599(2.41) | 0.00015(-0.21) |
The range of object number
are different between training and testing sets. By default, the pie charts in training sets contain 3 to 6 pie sectors, while those in testing sets contain 7 to 9 pie sectors. For VGG, RN and IRNm, all the outputs are9-dim vector
.
- Only our IRN_m and IRN_p can get a good result on testing set. (1) Our network can deal with the condition that training and testing sets have different object number. (2) Our network seems converage faster than VGG and RN. It seems that the validation loss of our IRN_m network has a stronger fluctuation than VGG, but it's not true. Because the order of magnitudes (数量级) of their validation loss are too much different.
MSE(MLAE) | VGG | VGG_seg | RN | IRN_p | IRN_m (!!!) |
---|---|---|---|---|---|
Train set | 0.00023(0.18) | 0.00022(0.11) | 0.00289(1.70) | 0.00015(-0.56) | 0.00010(-0.57) |
Test set | 0.13354(4.56) | 0.14972(4.79) | 0.15874(4.84) | 0.00087(0.97) | 0.00058(0.81) |
The line width
are different between training and testing sets in this task. By default, the line width of the piechart in training sets is 1, while the width of those in testing sets is 2 or 3. In addition, the output of networks is 6-dim vector, and each chart contains 3 to 6 pie sectors.
-
Due to different line width, PieLineWidth is unlike PieNumber whose training and testing sets have same appearence domain. However, the result is surprising. We found that both IRN_p and IRN_m can get a good result in testing set.. That means if we segmeneted objects in advance and directly using CNN to extract their individual features, it does make some effect.
-
For our IRN_m network, val_loss declines with train_loss in the early stage and also keeps for many epochs. We could see that IRN_m network is able to perform very well on val_set as on train_set. However, because we always opitimize the network using train_set, so it's okay and normal that val_loss would become bad on val_set when the network try to get much better results on train_set.
MSE(MLAE) | VGG | RN | IRN_p | IRN_m (!!!) |
---|---|---|---|---|
Train set | 0.00036(0.69) | 0.00429(2.26) | 0.00065(0.59) | 0.00018(0.01) |
Test set | 0.06459(4.26) | 0.05459(4.08) | 0.00160(1.27) | 0.00032(0.33) |
(The experiments that are same as Daniel's paper.)
For the following experiments, I only show the MSE and MLAE on testing sets.
Codes: [Bar charts] [Pie charts]
MSE(MLAE) | VGG | RN | IRN_m (!!!) |
---|---|---|---|
Bar chart | 0.00016(0.21) | 0.00394(2.34) | 0.00014(-0.31) |
Pie chart | 0.00028(0.57) | 0.00390(2.34) | 0.00021(0.11) |
Codes: [MULTI] [Type1] [Type2] [Type3] [Type4] [Type5]
MSE(MLAE) | VGG | RN | IRN_p (!!!) |
---|---|---|---|
Type1 | 0.000004(-1.77) | 0.000546(0.80) | 0.000008(-1.49) |
Type2 | 0.000005(-1.66) | 0.000485(0.72) | 0.000007(-1.57) |
Type3 | 0.000006(-1.63) | 0.000524(0.78) | 0.000007(-1.54) |
Type4 | 0.000004(-1.80) | 0.000494(0.74) | 0.000010(-1.34) |
Type5 | 0.000004(-1.77) | 0.000509(0.77) | 0.000009(-1.42) |
Multi | 0.000011(-1.41) | 0.000507(0.76) | 0.000008(-1.49) |
Codes: [Num10] [Num100] [Num1000]
MSE(MLAE) | VGG | RN | IRN_p (!!!) |
---|---|---|---|
Base10 | 0.000099(-0.17) | 0.002772(2.06) | 0.000016(-1.26) |
Base100 | 0.099914(4.77) | 0.005228(2.56) | 0.000045(-0.65) |
Base1000 | 0.101107(4.79) | 0.022654(3.58) | 0.000894(1.29) |