Randomly (cherry-picked :)) picked from one of experiments
- model: MODNet
- backbone: ResNet18
- trimaps: randomly generated on each iteration
- data: AISegment, no augmentation, no image harmonization
- training: 100k iterations, 4x8 batch size
- W&B report
old model: http://backgroundmatting.herokuapp.com/
- MobileNetv2 (trained within UNet on Supervisely dataset)
- ResNet18 (trained in DeepLabv3+ on Supervisely dataset)
- Pyramid Vision Transformer (5 pyramid stages instead of 4, no pretrained model yet)
- Supervisely - to pretrained on coarse annotations
- AISegment - to train with fine annotations
- Place365 - to augment foregrounds with different backgrounds
Trimaps are being generated on the fly during training.
To get diverse backgrounds for the same foregrounds, one can use COCO or Places365 dataset. Cut-and-paste approach results in unrealistic compositions. To do composition harmonization, one can use Foreground-aware Semantic Representations for Image Harmonization with pre-trained models for 256x256 images or Region-aware Adaptive Instance Normalization for Image Harmonization with pre-trained models for 512x512 images.
Code of MODNet architecture is based on official repo.
Code of PVT architecture is based on official repo.
Einops usage is learnt from lucidrains.
Image harmonization is based on Foreground-aware Semantic Representations for Image Harmonization and Region-aware Adaptive Instance Normalization for Image Harmonization.