Skip to content

Latest commit

 

History

History
95 lines (74 loc) · 3.53 KB

README.md

File metadata and controls

95 lines (74 loc) · 3.53 KB

StackGAN

💡 What's new?

  • We use BERT embeddings for the text description instead of the char-CNN-RNN text embeddings that were used in the paper implementation.

Pretrained model

  • Stage 1 trained using BERT embeddings instead of the orignal char-CNN-RNN text embeddings
  • Stage 2 trained using BERT embeddings instead of the orignal char-CNN-RNN text embeddings

Paper examples

🐦 Examples for birds (char-CNN-RNN embeddings), more on youtube:






🌻 Examples for flowers (char-CNN-RNN embeddings), more on youtube:






📋 Dependencies

git clone https://github.com/sahilkhose/StackGAN-BERT.git
pip3 install -r requirements.txt

Dataset

Check instructions in /input/README.md

cd input/src
python3 data.py

Generating BERT embeddings of annotations

Change the DEVICE to cpu in input/src/config.py if cuda is not available

python3 bert_emb.py  

🔧 Training

cd ../../src

Option 1: CLI args training src/args.py

python3 train.py --TRAIN_MAX_EPOCH 10 

Option 2: yaml args training cfg/s1.yml and cfg/s2.yml

python3 train.py --conf ../cfg/s1.yml

mkdir ../old_outputs
mv ../output ../old_outputs/output_stage-1

python3 train.py --conf ../cfg/s2.yml

mv ../output ../old_outputs/output_stage-2

To load the tensorboard

tensorboard --logdir=../output 

📚 Citing StackGAN

If you find StackGAN useful in your research, please consider citing:

@inproceedings{han2017stackgan,
Author = {Han Zhang and Tao Xu and Hongsheng Li and Shaoting Zhang and Xiaogang Wang and Xiaolei Huang and Dimitris Metaxas},
Title = {StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks},
Year = {2017},
booktitle = {{ICCV}},
}

Follow-up work

References

  • Generative Adversarial Text-to-Image Synthesis Paper Code
  • Learning Deep Representations of Fine-grained Visual Descriptions Paper Code