This is the unofficial implementation of the paper LEDNet.
This repo contains the whole model architecture for the LEDNet model. You can also view the soon to be released official implementation at this repo. Apart from this, I have tried to replicate the model to the very best given in the paper and have even used some inspiration for the missing information from the ENet model. Hope you will find this model useful 😄
Currently this repo contains the directory model containing the whole model architecture excluding the model architecture. So to use as a model in your segmentation task you can just place the directory model
in your working directory. After that
from model import return_model
....
# So the below line will initialize the LEDNet model for 128*128 images
seg_model = return_model(input_nc = 3, output_nc = 22, netG = 'lednet_128')
# Also the input_channels and output_channels can be handled accordingly
Also the model architecture has already been tested. Soon I will update the training and testing procedure for the VOC segmentation task and also for the Cityscapes dataset.
Although most of the things have been taken up directly from what was specified in the original paper, but due to some changes it is best to specify them here:
- As specified in the original Enet paper, I have used
PReLu
activation in the encoder part, but usedReLu
in the decoder part. However have bias terms in all of the network. - In every downsampling block after the concatenation of the two parallel operations there is application of activation.
- Also
BatchNorm2d
has been in everySSnbt
module, as the results were not that great in its absence. - There is application of
Dropout2d
in everySSnbt
module after concatenation of its left and right branch. - Most importantly, for the upsampling in the end and also in the
APN
module, I have usedBilinear Interpolation
. I also tried usingConvTraspose2d
initially but it lead very poor results and also checkered effects in the final results. - Also this model works for the case of 128*128 and 256*256 images, in contrast to input size of images in the original paper.