Trimap-free background matting

Results

Randomly (cherry-picked :)) picked from one of experiments

model: MODNet
backbone: ResNet18
trimaps: randomly generated on each iteration
data: AISegment, no augmentation, no image harmonization
training: 100k iterations, 4x8 batch size
W&B report

My selfie:

Test set pictures:

old model: http://backgroundmatting.herokuapp.com/

Backbones

MobileNetv2 (trained within UNet on Supervisely dataset)
ResNet18 (trained in DeepLabv3+ on Supervisely dataset)
Pyramid Vision Transformer (5 pyramid stages instead of 4, no pretrained model yet)

Datasets

Supervisely - to pretrained on coarse annotations
AISegment - to train with fine annotations
Place365 - to augment foregrounds with different backgrounds

Dataset formation

Trimaps are being generated on the fly during training.

To get diverse backgrounds for the same foregrounds, one can use COCO or Places365 dataset. Cut-and-paste approach results in unrealistic compositions. To do composition harmonization, one can use Foreground-aware Semantic Representations for Image Harmonization with pre-trained models for 256x256 images or Region-aware Adaptive Instance Normalization for Image Harmonization with pre-trained models for 512x512 images.

Acknowledgement

Code of MODNet architecture is based on official repo.
Code of PVT architecture is based on official repo.
Einops usage is learnt from lucidrains.
Image harmonization is based on Foreground-aware Semantic Representations for Image Harmonization and Region-aware Adaptive Instance Normalization for Image Harmonization.