Skip to content

Commit

Permalink
First commit
Browse files Browse the repository at this point in the history
  • Loading branch information
HengWeiBin committed Oct 27, 2022
0 parents commit c27dccb
Show file tree
Hide file tree
Showing 73 changed files with 14,091 additions and 0 deletions.
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2022 Jiacong Xu

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
158 changes: 158 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
# PIDNet: A Real-time Semantic Segmentation Network Inspired from PID Controller

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pidnet-a-real-time-semantic-segmentation/real-time-semantic-segmentation-on-camvid)](https://paperswithcode.com/sota/real-time-semantic-segmentation-on-camvid?p=pidnet-a-real-time-semantic-segmentation) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pidnet-a-real-time-semantic-segmentation/real-time-semantic-segmentation-on-cityscapes)](https://paperswithcode.com/sota/real-time-semantic-segmentation-on-cityscapes?p=pidnet-a-real-time-semantic-segmentation)

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pidnet-a-real-time-semantic-segmentation/real-time-semantic-segmentation-on-cityscapes-1)](https://paperswithcode.com/sota/real-time-semantic-segmentation-on-cityscapes-1?p=pidnet-a-real-time-semantic-segmentation) [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)

This is the official repository for our recent work: PIDNet ([PDF](https://arxiv.org/pdf/2206.02066)

## Highlights
<p align="center">
<img src="figs/cityscapes_score.jpg" alt="overview-of-our-method" width="500"/></br>
<span align="center">Comparison of inference speed and accuracy for real-time models on test set of Cityscapes.</span>
</p>

* **Towards Real-time Applications**: PIDNet could be directly used for the real-time applications, such as autonomous vehicle and medical imaging.
* **A Novel Three-branch Network**: Addtional boundary branch is introduced to two-branch network to mimic the PID controller architecture and remedy the overshoot issue of previous models.
* **More Accurate and Faster**: PIDNet-S presents 78.6% mIOU with speed of 93.2 FPS on Cityscapes test set and 80.1% mIOU with speed of 153.7 FPS on CamVid test set. Also, PIDNet-L becomes the most accurate one (80.6% mIOU) among all the real-time networks for Cityscapes.

## Updates
- Fixed the data bug for Camvid and the new version of arXiv preprint will be available on Jun 13th. (Jun/09/2022)
- Our paper was marked as state of the art in [Papers with Code](https://paperswithcode.com/task/real-time-semantic-segmentation). (Jun/06/2022)
- Our paper was submitted to arXiv for public access. (Jun/04/2022)
- The training and testing codes and trained models for PIDNet are available here. (Jun/03/2022)

## Demos

A demo of the segmentation performance of our proposed PIDNets: Original video (left) and predictions of PIDNet-S (middle) and PIDNet-L (right)
<p align="center">
<img src="figs/video1_all.gif" alt="Cityscapes" width="800"/></br>
<span align="center">Cityscapes Stuttgart demo video #1</span>
</p>

<p align="center">
<img src="figs/video2_all.gif" alt="Cityscapes" width="800"/></br>
<span align="center">Cityscapes Stuttgart demo video #2</span>
</p>

## Overview
<p align="center">
<img src="figs/pidnet.jpg" alt="overview-of-our-method" width="800"/></br>
<span align="center">An overview of the basic architecture of our proposed Proportional-Integral-Derivative Network (PIDNet). </span>
</p>
P, I and D branches are responsiable for detail preservation, context embedding and boundary detection, respectively.

### Detailed Implementation
<p align="center">
<img src="figs/pidnet_table.jpg" alt="overview-of-our-method" width="500"/></br>
<span align="center">Instantiation of the PIDNet for semantic segmentation. </span>
</p>
For operation, "OP, N, C" means operation OP with stride of N and the No. output channel is C; Output: output size given input size of 1024; mxRB: m residual basic blocks; 2xRBB: 2 residual bottleneck blocks; OP<sub>1</sub>\OP<sub>2</sub>: OP<sub>1</sub> is used for PIDNet-L while OP<sub>1</sub> is applied in PIDNet-M and PIDNet-S. (m,n,C) are scheduled to be (2,3,32), (2,3,64) and (3,4,64) for PIDNet-S, PIDNet-M and PIDNet-L, respectively.

## Models
For simple reproduction, we provide the ImageNet pretrained models here.

| Model (ImageNet) | PIDNet-S | PIDNet-M | PIDNet-L |
|:-:|:-:|:-:|:-:|
| Link | [download](https://drive.google.com/file/d/1hIBp_8maRr60-B3PF0NVtaA6TYBvO4y-/view?usp=sharing) | [download](https://drive.google.com/file/d/1gB9RxYVbdwi9eO5lbT073q-vRoncpYT1/view?usp=sharing) | [download](https://drive.google.com/file/d/1Eg6BwEsnu3AkKLO8lrKsoZ8AOEb2KZHY/view?usp=sharing) |

Also, the finetuned models on Cityscapes and Camvid are available for direct application in road scene parsing.

| Model (Cityscapes) | Val (% mIOU) | Test (% mIOU)| FPS |
|:-:|:-:|:-:|:-:|
| PIDNet-S | [78.8](https://drive.google.com/file/d/1JakgBam_GrzyUMp-NbEVVBPEIXLSCssH/view?usp=sharing) | [78.6](https://drive.google.com/file/d/1VcF3NXLQvz2qE3LXttpxWQSdxTbATslO/view?usp=sharing) | 93.2 |
| PIDNet-M | [79.9](https://drive.google.com/file/d/1q0i4fVWmO7tpBKq_eOyIXe-mRf_hIS7q/view?usp=sharing) | [79.8](https://drive.google.com/file/d/1wxdFBzMmkF5XDGc_LkvCOFJ-lAdb8trT/view?usp=sharing) | 42.2 |
| PIDNet-L | [80.9](https://drive.google.com/file/d/1AR8LHC3613EKwG23JdApfTGsyOAcH0_L/view?usp=sharing) | [80.6](https://drive.google.com/file/d/1Ftij_vhcd62WEBqGdamZUcklBcdtB1f3/view?usp=sharing) | 31.1 |

| Model (CamVid) | Val (% mIOU) | Test (% mIOU)| FPS |
|:-:|:-:|:-:|:-:|
| PIDNet-S |-| [80.1](https://drive.google.com/file/d/1h3IaUpssCnTWHiPEUkv-VgFmj86FkY3J/view?usp=sharing) | 153.7 |
| PIDNet-M |-| [82.0](https://drive.google.com/file/d/1rNGTc8LD42h8G3HaedtqwS0un4_-gEbB/view?usp=sharing) | 85.6 |

## Prerequisites
This implementation is based on [HRNet-Semantic-Segmentation](https://github.com/HRNet/HRNet-Semantic-Segmentation). Please refer to their repository for installation and dataset preparation. The inference speed is tested on single RTX 3090 using the method introduced by [SwiftNet](https://arxiv.org/pdf/1903.08469.pdf). No third-party acceleration lib is used, so you can try [TensorRT](https://github.com/NVIDIA/TensorRT) or other approaches for faster speed.

## Usage

### 0. Prepare the dataset

* Download the [Cityscapes](https://www.cityscapes-dataset.com/) and [CamVid](http://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/) datasets and unzip them in `data/cityscapes` and `data/camvid` dirs.
* Check if the paths contained in lists of `data/list` are correct for dataset images.

#### :smiley_cat: Instruction for preparation of CamVid data (remains discussion) :smiley_cat:

* Download the images and annotations from [Kaggle](https://www.kaggle.com/datasets/carlolepelaars/camvid), where the resolution of images is 960x720 (original);
* Unzip the data and put all the images and all the colored labels into `data/camvid/images/` and `data/camvid/labels`, respectively;
* Following the split of train, val and test sets used in [SegNet-Tutorial](https://github.com/alexgkendall/SegNet-Tutorial), we have generated the dataset lists in `data/list/camvid/`;
* Finished!!! (We have open an issue for everyone who's interested in CamVid to discuss where to download the data and if the split in [SegNet-Tutorial](https://github.com/alexgkendall/SegNet-Tutorial) is correct. BTW, do not directly use the split in [Kaggle](https://www.kaggle.com/datasets/carlolepelaars/camvid), which is wrong and will lead to unnormal high accuracy. We have revised the CamVid content in the paper and you will see the correct results after its announcement.)
### 1. Training

* Download the ImageNet pretrained models and put them into `pretrained_models/imagenet/` dir.
* For example, train the PIDNet-S on Cityscapes with batch size of 12 on 2 GPUs:
````bash
python tools/train.py --cfg configs/cityscapes/pidnet_small_cityscapes.yaml GPUS (0,1) TRAIN.BATCH_SIZE_PER_GPU 6
````
* Or train the PIDNet-L on Cityscapes using train and val sets simultaneously with batch size of 12 on 4 GPUs:
````bash
python tools/train.py --cfg configs/cityscapes/pidnet_large_cityscapes_trainval.yaml GPUS (0,1,2,3) TRAIN.BATCH_SIZE_PER_GPU 3
````

### 2. Evaluation

* Download the finetuned models for Cityscapes and CamVid and put them into `pretrained_models/cityscapes/` and `pretrained_models/camvid/` dirs, respectively.
* For example, evaluate the PIDNet-S on Cityscapes val set:
````bash
python tools/eval.py --cfg configs/cityscapes/pidnet_small_cityscapes.yaml \
TEST.MODEL_FILE pretrained_models/cityscapes/PIDNet_S_Cityscapes_val.pt
````
* Or, evaluate the PIDNet-M on CamVid test set:
````bash
python tools/eval.py --cfg configs/camvid/pidnet_medium_camvid.yaml \
TEST.MODEL_FILE pretrained_models/camvid/PIDNet_M_Camvid_Test.pt \
DATASET.TEST_SET list/camvid/test.lst
````
* Generate the testing results of PIDNet-L on Cityscapes test set:
````bash
python tools/eval.py --cfg configs/cityscapes/pidnet_large_cityscapes_trainval.yaml \
TEST.MODEL_FILE pretrained_models/cityscapes/PIDNet_L_Cityscapes_test.pt \
DATASET.TEST_SET list/cityscapes/test.lst
````

### 3. Speed Measurement

* Measure the inference speed of PIDNet-S for Cityscapes:
````bash
python models/speed/pidnet_speed.py --a 'pidnet-s' --c 19 --r 1024 2048
````
* Measure the inference speed of PIDNet-M for CamVid:
````bash
python models/speed/pidnet_speed.py --a 'pidnet-m' --c 11 --r 720 960
````

### 4. Custom Inputs

* Put all your images in `samples/` and then run the command below using Cityscapes pretrained PIDNet-L for image format of .png:
````bash
python tools/custom.py --a 'pidnet-l' --p '../pretrained_models/cityscapes/PIDNet_L_Cityscapes_test.pt' --t '.png'
````

## Citation

If you think this implementation is useful for your work, please cite our paper:
```
@misc{xu2022pidnet,
title={PIDNet: A Real-time Semantic Segmentation Network Inspired from PID Controller},
author={Jiacong Xu and Zixiang Xiong and Shankar P. Bhattacharyya},
year={2022},
eprint={2206.02066},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```

## Acknowledgement

* Our implementation is modified based on [HRNet-Semantic-Segmentation](https://github.com/HRNet/HRNet-Semantic-Segmentation).
* Latency measurement code is borrowed from the [DDRNet](https://github.com/ydhongHIT/DDRNet).
* Thanks for their nice contribution.

11 changes: 11 additions & 0 deletions configs/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# Written by Ke Sun ([email protected])
# ------------------------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from .default import _C as config
from .default import update_config
Binary file added configs/__pycache__/__init__.cpython-38.pyc
Binary file not shown.
Binary file added configs/__pycache__/default.cpython-38.pyc
Binary file not shown.
55 changes: 55 additions & 0 deletions configs/camvid/pidnet_medium_camvid.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
CUDNN:
BENCHMARK: true
DETERMINISTIC: false
ENABLED: true
GPUS: (0,1)
OUTPUT_DIR: 'output'
LOG_DIR: 'log'
WORKERS: 3
PRINT_FREQ: 10

DATASET:
DATASET: camvid
ROOT: data/
TEST_SET: 'list/camvid/val.lst'
TRAIN_SET: 'list/camvid/trainval.lst'
NUM_CLASSES: 11
MODEL:
NAME: pidnet_m
NUM_OUTPUTS: 2
PRETRAINED: "pretrained_models/cityscapes/PIDNet_M_Cityscapes_test.pt"
LOSS:
USE_OHEM: true
OHEMTHRES: 0.9
OHEMKEEP: 131072
BALANCE_WEIGHTS: [0.4, 1.0]
SB_WEIGHTS: 1.0
TRAIN:
IMAGE_SIZE:
- 960
- 720
BASE_SIZE: 960
BATCH_SIZE_PER_GPU: 6
SHUFFLE: true
BEGIN_EPOCH: 0
END_EPOCH: 200
RESUME: false
OPTIMIZER: sgd
LR: 0.005
WD: 0.0005
MOMENTUM: 0.9
NESTEROV: false
FLIP: true
MULTI_SCALE: true
IGNORE_LABEL: 255
SCALE_FACTOR: 16
TEST:
IMAGE_SIZE:
- 960
- 720
BASE_SIZE: 960
BATCH_SIZE_PER_GPU: 1
FLIP_TEST: false
MULTI_SCALE: false
MODEL_FILE: ''
OUTPUT_INDEX: 1
55 changes: 55 additions & 0 deletions configs/camvid/pidnet_small_camvid.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
CUDNN:
BENCHMARK: true
DETERMINISTIC: false
ENABLED: true
GPUS: (0,1)
OUTPUT_DIR: 'output'
LOG_DIR: 'log'
WORKERS: 3
PRINT_FREQ: 10

DATASET:
DATASET: camvid
ROOT: data/
TEST_SET: 'list/camvid/val.lst'
TRAIN_SET: 'list/camvid/trainval.lst'
NUM_CLASSES: 11
MODEL:
NAME: pidnet_s
NUM_OUTPUTS: 2
PRETRAINED: "pretrained_models/cityscapes/PIDNet_S_Cityscapes_test.pt"
LOSS:
USE_OHEM: true
OHEMTHRES: 0.9
OHEMKEEP: 131072
BALANCE_WEIGHTS: [0.4, 1.0]
SB_WEIGHTS: 1.0
TRAIN:
IMAGE_SIZE:
- 960
- 720
BASE_SIZE: 960
BATCH_SIZE_PER_GPU: 6
SHUFFLE: true
BEGIN_EPOCH: 0
END_EPOCH: 200
RESUME: false
OPTIMIZER: sgd
LR: 0.005
WD: 0.0005
MOMENTUM: 0.9
NESTEROV: false
FLIP: true
MULTI_SCALE: true
IGNORE_LABEL: 255
SCALE_FACTOR: 16
TEST:
IMAGE_SIZE:
- 960
- 720
BASE_SIZE: 960
BATCH_SIZE_PER_GPU: 1
FLIP_TEST: false
MULTI_SCALE: false
MODEL_FILE: ''
OUTPUT_INDEX: 1
55 changes: 55 additions & 0 deletions configs/cityscapes/pidnet_large_cityscapes.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
CUDNN:
BENCHMARK: true
DETERMINISTIC: false
ENABLED: true
GPUS: (0,1)
OUTPUT_DIR: 'output'
LOG_DIR: 'log'
WORKERS: 6
PRINT_FREQ: 10

DATASET:
DATASET: cityscapes
ROOT: data/
TEST_SET: 'list/cityscapes/val.lst'
TRAIN_SET: 'list/cityscapes/train.lst'
NUM_CLASSES: 19
MODEL:
NAME: pidnet_large
NUM_OUTPUTS: 2
PRETRAINED: "pretrained_models/imagenet/PIDNet_L_ImageNet.pth.tar"
LOSS:
USE_OHEM: true
OHEMTHRES: 0.9
OHEMKEEP: 131072
BALANCE_WEIGHTS: [0.4, 1.0]
SB_WEIGHTS: 1.0
TRAIN:
IMAGE_SIZE:
- 1024
- 1024
BASE_SIZE: 2048
BATCH_SIZE_PER_GPU: 6
SHUFFLE: true
BEGIN_EPOCH: 0
END_EPOCH: 484
RESUME: false
OPTIMIZER: sgd
LR: 0.01
WD: 0.0005
MOMENTUM: 0.9
NESTEROV: false
FLIP: true
MULTI_SCALE: true
IGNORE_LABEL: 255
SCALE_FACTOR: 16
TEST:
IMAGE_SIZE:
- 2048
- 1024
BASE_SIZE: 2048
BATCH_SIZE_PER_GPU: 6
FLIP_TEST: false
MULTI_SCALE: false
MODEL_FILE: ''
OUTPUT_INDEX: 1
Loading

0 comments on commit c27dccb

Please sign in to comment.