VAE are among the state of the art generative models, but have recently lost their shine to GANs. The most prominent work recently in which is the Style-GAN by Karras et al. VAE has the ability to encode as well as decode - this advantage over the style-gan is useful in many downstream tasks. In this work we combine the style based architecture and VAE and achieve state of the art reconstruction and generation. We follow the work of Hou et al. DFC-VAE to use perceptual loss and we compare our results to this work.
The loss is comprised out of two components:
- Reconstruction Loss - based on perceptual loss (pre-trained VGG16 features)
- Latent Loss - kl-divergence loss
- . /Model
- Layers class
VaeLayers
- The perceptual model class
PerceptualModel
- Our model class
StyleVae
- Layers class
- . /Train
- The trainer class
StyleVaeTrainer
- Train script
$ python train.py --load <True|False>
- Test script
$ python test.py
- All saved models should be saved under
./train_output
to restore
- The trainer class
- . /Data
- Dataset class
Dataset
- All data should be saved under
/data/svae/*.png
- Dataset class
Test results can be seen in the visuals part
We used the provided model to train on the FFHQ dataset to produce a 256x256 results:
- Adversarial Training- Adding an adversarial term to improve generation of hair and fine details (hair) and background especially in the high resulotion models
- Weighted KL-Divergence- The hierarchical structure of the code injection would allow weighted KL-divergence loss term where we allow the VAE encoder to different divergence fine/coarse features. This flexibility makes sense perhaps because fine details like hairs are more normally distributed, while coarse feature behave differently (not many dark skin red heads i.e.)
Available soon...