pip install -r requirements.txt
Additional Dependencies:
hpsv2 https://github.com/tgxs002/HPSv2
ImageReward https://github.com/THUDM/ImageReward
deepface https://github.com/serengil/deepface
first, run
python 1_generate_divers_prompts.py
to generate diverse prompts based on basic prompts in data/training_prompts.csv
then,run
bash 2_generate_images.sh
to generate images based on diverse prompts and basic prompts. this script also runs a classifier on generated images
then run
python 3_generate_preferences.py
To generate preference data
then run
bash 4_train.sh
to train the model using pop-align.
Finally, run
bash eval.sh
which will evaluate the model on identity-specific and identity-neutral prompts. It will also run the classifier to compute the discrepancy metric, as well as a series of score model to compute the image quality metric.
This codebase trains and evaluates SDXL using the PopAlign Objective
The assets used in this work (datasets, preference models) are publicly available and are used according to their respective licenses. This code is released privately for the purposes of our submission, and will eventually be made public under the Apache 2.0 License (LICENSE).
This data uses images generated by SDXL as training samples
Training Samples are generated with identity-neutral prompts and identity-specific prompts. They are paired to create a preference data. See paper for more details.
We train with the following hyperparameters:
- Learning Rate: 5e-7
- Batch size: 8
- Steps: 750
We evaluate using HPS v2, PickScore, CLIP, LAION Aesthetics, as well as Deepface classifer for fairness
We curate our own test data, which is included in this repo under data
We use the SDXL architecture (U-Net, VAE, CLIP text encoder) and only fine-tune the U-Net with our objective.
We train with 4 NVIDIA A5000 GPUs for less than 1 day per experiments.