Skip to content

Latest commit

 

History

History
executable file
·
104 lines (62 loc) · 7.28 KB

README.MD

File metadata and controls

executable file
·
104 lines (62 loc) · 7.28 KB

Improved Localization Accuracy by LocNet for Faster R-CNN

1. Introduction

This project is a Simplified Faster R-CNN improved by LocNet (Loc-Faster-RCNN for short) implementation based on Faster R-CNN by chenyuntc. It aims in:

  • Improve the localization accuracy of Faster R-CNN by using LocNet in the Fast R-CNN part.
  • The first public implementation of the original paper. The author of the paper didn't release their version.
  • Match the performance reported in original paper.

And it has the following features:

  • It can be run as pure Python code, no more build affair. (cuda code moves to cupy, Cython acceleration are optional)

This implementation is slightlly different from the original paper:

  • Skip pooling is not used here. Informations from conv5_3 layer(the feature map of original Faster R-CNN) is enough for my task, so skip pooling is droped in this repo. What's more, with the advent of new methods like Feature Pyramid Networks, skip pooling seems to be obsolete :)
  • The RPN net is exactly same as Faster R-CNN, which means only 3X3 conv is applied, rather than 3X3 and 5X5 conv nets in the original paper.
  • Training strategy. The original paper train the RPN and LocNet alternately, but losses of RPN and LocNet are backproped at the same time in this repo.

prob_thre :

  • Hyperparameters in Loc-Faster-RCNN are mostly like Faster R-CNN except for prob_thre.
  • prob_thre is the threshold of probability used when predicting the bounding box, if px or py is greater than prob_thre, this row or column is considered to be part of some object.
  • Different detection tasks may have different appropriate prob_thre to achive best performance. If most objects in the detection task are dense blocks, a higher prob_thre may achive better performance.
  • You can choose your own prob_thre according to your task characteristics. Use eval_prob_thre function in train.py to find out the best prob_thre for your task. Remember to set load_path variable in the utils/config.py to your best model before calling this function.

2. Performance

2.1 Pascal VOC

Training and test set of Pascal VOC 2007 are used in this repo.

2.1.1 mAP

The best prob_thre for Pascal VOC is 0.5. When using prob_thre=0.5, the performance of Loc-Faster-RCNN is listed as follows. So with dataset like Pascal VOC, Loc-Faster-RCNN can not achieve better result than Faster R-CNN. However, when apppied to dataset with lots of small and dense objects, Loc-Faster-RCNN is likely to achieve better performance.

Implementation mAP
Loc-Faster-RCNN 0.6527
Faster R-CNN 0.7097

2.1.2 Differences between models predictions in Pascal VOC

  • LocNet part improves localization accuracy of Loc-Faster-RCNN by predicting the probability rather than locations. This helps when models is used to detection small objects or objects that are not so obvious. Like shown in the first 2 rows below, Loc-Faster-RCNN detected a person(row 1) and plant(row 2) even the objects are too small and not obvious.
  • However, LocNet part also hinders the model from identifying small parts of objects,which are more densely connected with the background rather than the main part of that object, like tail of a cat or wings of a bird ,as shown in the 3th~5th rows bellow.
  • What's more, if objects are overlaping or densely connected with each other in the same image, Loc-Faster-RCNN also have difficulty in drawing accurate bounding boxes around objects, as shown in the last row bellow.
Ground Truth Loc-Faster-RCNN Faster R-CNN

2.2 Text detection dataset

ICDAR-2011 and ICDAR-2013 are used in training and eveluating.

TBD.

3. Install dependencies

This repo is built basically on Faster R-CNN. You can check this repo to see dependencies.

4. Train

Compared with Faster R-CNN, Loc-Faster-RCNN is a little bit harder to train. If same initinal learning rate of 1e-3 is applied, the model may not converge after several epoches because px pr py would be nan. So if you encounter the same problem when using Loc-Faster-RCNN on your own dataset, maybe a smaller learning rate of 1e-4 or 1e-5 should work.

Troubleshooting

More

  • model structure
  • maybe : skip pooling
  • Maybe : conv 3X3 and conv 5X5 in RPN
  • High likely : Feature Pyramid Network as backbone
  • High likely : RoI Align rather than RoI Pooling

Acknowledgement

This work builds on many excellent works, which include:


Licensed under MIT, see the LICENSE for more detail.

Contribution Welcome.

If you encounter any problem, feel free to open an issue.

Correct me if anything is wrong or unclear.