This work has been published in arXiv: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
.
Currently the network can be trained on three datasets:
Datasets | Input Resolution | Output Resolution^ | # of classes |
---|---|---|---|
CamVid | 480x360 | 60x45 | 11 |
Cityscapes | 1024x512 | 128x64 | 19 |
SUN RGBD | 256x200 | 32x25 | 37 |
^ is the encoder output resolution; decoder output resolution is the same as that of the input image. Folder arrangement of the datasets compatible with our data-loader has been explained in detail here.
- run.lua : main file
- opts.lua : contains all the input options used by the tranining script
- data : data loaders for loading datasets
- models : all the model architectures are defined here
- train.lua : loading of models and error calculation
- test.lua : calculate testing error and save confusion matrices
th run.lua --dataset cs --datapath /Cityscapes/dataset/path/ --model models/encoder.lua --save /save/trained/model/ --imHeight 256 --imWidth 512 --labelHeight 32 --labelWidth 64
th run.lua --dataset cs --datapath /Cityscape/dataset/path/ --model models/decoder.lua --imHeight 256 --imWidth 512 --labelHeight 256 --labelWidth 512
Use cachepath
option to save your loaded dataset in .t7
format so that you won't have to load it again from scratch.