feat: add TencentARC PhotoMaker support (leejet#179)

* first efforts at implementing photomaker; lots more to do * added PhotoMakerIDEncoder model in SD * fixed soem bugs; now photomaker model weights can be loaded into their tensor buffers * added input id image loading * added preprocessing inpit id images * finished get_num_tensors * fixed a bug in remove_duplicates * add a get_learned_condition_with_trigger function to do photomaker stuff * add a convert_token_to_id function for photomaker to extract trigger word's token id * making progress; need to implement tokenizer decoder * making more progress; finishing vision model forward * debugging vision_model outputs * corrected clip vision model output * continue making progress in id fusion process * finished stacked id embedding; to be tested * remove garbage file * debuging graph compute * more progress; now alloc buffer failed * fixed wtype issue; input images can only be 1 because issue with transformer when batch size > 1 (to be investigated) * added delayed subject conditioning; now photomaker runs and generates images * fixed stat_merge_step * added photomaker lora model (to be tested) * reworked pmid lora * finished applying pmid lora; to be tested * finalized pmid lora * add a few print tensor; tweak in sample again * small tweak; still not getting ID faces * fixed a bug in FuseBlock forward; also remove diag_mask op in for vision transformer; getting better results * disable pmid lora apply for now; 1 input image seems working; > 1 not working * turn pmid lora apply back on * fixed a decode bug * fixed a bug in ggml's conv_2d, and now > 1 input images working * add style_ratio as a cli param; reworked encode with trigger for attention weights * merge commit fixing lora free param buffer error * change default style ratio to 10% * added an option to offload vae decoder to CPU for mem-limited gpus * removing image normalization step seems making ID fidelity much higher * revert default style ratio back ro 20% * added an option for normalizing input ID images; cleaned up debugging code * more clean up * fixed bugs; now failed with cuda error; likely out-of-mem on GPU * free pmid model params when required * photomaker working properly now after merging and adapting to GGMLBlock API * remove tensor renaming; fixing names in the photomaker model file * updated README.md to include instructions and notes for running PhotoMaker * a bit clean up * remove -DGGML_CUDA_FORCE_MMQ; more clean up and README update * add input image requirement in README * bring back freeing pmid lora params buffer; simply pooled output of CLIPvision * remove MultiheadAttention2; customized MultiheadAttention * added a WIN32 get_files_from_dir; turn off Photomakder if receiving no input images * update docs * fix ci error * make stable-diffusion.h a pure c header file This reverts commit 27887b6. * fix ci error * format code * reuse get_learned_condition * reuse pad_tokens * reuse CLIPVisionModel * reuse LoraModel * add --clip-on-cpu * fix lora name conversion for SDXL --------- Co-authored-by: bssrdf <[email protected]> Co-authored-by: leejet <[email protected]>
phyllispeng123 · Mar 12, 2024 · a469688 · a469688
1 parent 6198017
commit a469688
Show file tree

Hide file tree

Showing 28 changed files with 3,915 additions and 166 deletions.
diff --git a/README.md b/README.md
@@ -14,6 +14,7 @@ Inference of [Stable Diffusion](https://github.com/CompVis/stable-diffusion) in
     - !!!The VAE in SDXL encounters NaN issues under FP16, but unfortunately, the ggml_conv_2d only operates under FP16. Hence, a parameter is needed to specify the VAE that has fixed the FP16 NaN issue. You can find it here: [SDXL VAE FP16 Fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl_vae.safetensors).
 
 - [SD-Turbo](https://huggingface.co/stabilityai/sd-turbo) and [SDXL-Turbo](https://huggingface.co/stabilityai/sdxl-turbo) support
+- [PhotoMaker](https://github.com/TencentARC/PhotoMaker) support.
 - 16-bit, 32-bit float support
 - 4-bit, 5-bit and 8-bit integer quantization support
 - Accelerated memory-efficient CPU inference
@@ -151,7 +152,7 @@ cmake --build . --config Release
 ### Run
 
 ```
-usage: ./build/bin/sd [arguments]
+usage: ./bin/sd [arguments]
 
 arguments:
   -h, --help                         show this help message and exit
@@ -163,6 +164,9 @@ arguments:
   --taesd [TAESD_PATH]               path to taesd. Using Tiny AutoEncoder for fast decoding (low quality)
   --control-net [CONTROL_PATH]       path to control net model
   --embd-dir [EMBEDDING_PATH]        path to embeddings.
+  --stacked-id-embd-dir [DIR]        path to PHOTOMAKER stacked id embeddings.
+  --input-id-images-dir [DIR]        path to PHOTOMAKER input id images dir.
+  --normalize-input                  normalize PHOTOMAKER input id images
   --upscale-model [ESRGAN_PATH]      path to esrgan model. Upscale images after generate, just RealESRGAN_x4plus_anime_6B supported by now.
   --upscale-repeats                  Run the ESRGAN upscaler this many times (default 1)
   --type [TYPE]                      weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0)
@@ -175,6 +179,7 @@ arguments:
   -n, --negative-prompt PROMPT       the negative prompt (default: "")
   --cfg-scale SCALE                  unconditional guidance scale: (default: 7.0)
   --strength STRENGTH                strength for noising/unnoising (default: 0.75)
+  --style-ratio STYLE-RATIO          strength for keeping input identity (default: 20%)
   --control-strength STRENGTH        strength to apply Control Net (default: 0.9)
                                      1.0 corresponds to full destruction of information in init image
   -H, --height H                     image height, in pixel space (default: 512)
@@ -299,6 +304,39 @@ You can use ESRGAN to upscale the generated images. At the moment, only the [Rea
 sd -m ../models/v1-5-pruned-emaonly.safetensors -p "a lovely cat" --upscale-model ../models/RealESRGAN_x4plus_anime_6B.pth
 ```
 
+#### Using PhotoMaker to personalize image generation
+
+You can use [PhotoMaker](https://github.com/TencentARC/PhotoMaker) to personalize generated images with your own ID. 
+
+**NOTE**, currently PhotoMaker **ONLY** works with **SDXL** (any SDXL model files will work).
+
+Download PhotoMaker model file (in safetensor format) [here](https://huggingface.co/bssrdf/PhotoMaker). The official release of the model file (in .bin format) does not work with ```stablediffusion.cpp```.
+
+- Specify the PhotoMaker model path using the `--stacked-id-embd-dir PATH` parameter. 
+- Specify the input images path using the `--input-id-images-dir PATH` parameter. 
+  - input images **must** have the same width and height for preprocessing (to be improved)
+
+In prompt, make sure you have a class word followed by the trigger word ```"img"``` (hard-coded for now). The class word could be one of ```"man, woman, girl, boy"```. If input ID images contain asian faces, add ```Asian``` before the class
+word.
+
+Another PhotoMaker specific parameter:
+
+- ```--style-ratio  (0-100)%```: default is 20 and 10-20 typically gets good results. Lower ratio means more faithfully following input ID (not necessarily better quality).
+
+Other parameters recommended for running Photomaker:
+
+- ```--cfg-scale 5.0```
+- ```-H 1024```
+- ```-W 1024```
+
+If on low memory GPUs (<= 8GB), recommend running with ```--vae-on-cpu``` option to get artifact free images.
+
+Example:
+
+```bash
+bin/sd -m ../models/sdxlUnstableDiffusers_v11.safetensors  --vae ../models/sdxl_vae.safetensors --stacked-id-embd-dir ../models/photomaker-v1.safetensors --input-id-images-dir ../assets/examples/scarletthead_woman -p "a girl img, retro futurism, retro game art style but extremely beautiful, intricate details, masterpiece, best quality, space-themed, cosmic, celestial, stars, galaxies, nebulas, planets, science fiction, highly detailed" -n "realistic, photo-realistic, worst quality, greyscale, bad anatomy, bad hands, error, text" --cfg-scale 5.0  --sampling-method euler -H 1024 -W 1024 --style-ratio 10 --vae-on-cpu -o output.png
+```
+
 ### Docker
 
 #### Building using Docker
@@ -345,3 +383,4 @@ Thank you to all the people who have already contributed to stable-diffusion.cpp
 - [k-diffusion](https://github.com/crowsonkb/k-diffusion)
 - [latent-consistency-model](https://github.com/luosiallen/latent-consistency-model)
 - [generative-models](https://github.com/Stability-AI/generative-models/)
+- [PhotoMaker](https://github.com/TencentARC/PhotoMaker)
diff --git a/assets/photomaker_examples/lenna_woman/lenna.jpg b/assets/photomaker_examples/lenna_woman/lenna.jpg
diff --git a/assets/photomaker_examples/newton_man/newton_0.jpg b/assets/photomaker_examples/newton_man/newton_0.jpg
diff --git a/assets/photomaker_examples/newton_man/newton_1.jpg b/assets/photomaker_examples/newton_man/newton_1.jpg
diff --git a/assets/photomaker_examples/newton_man/newton_2.png b/assets/photomaker_examples/newton_man/newton_2.png
diff --git a/assets/photomaker_examples/newton_man/newton_3.jpg b/assets/photomaker_examples/newton_man/newton_3.jpg
diff --git a/assets/photomaker_examples/scarletthead_woman/scarlett_0.jpg b/assets/photomaker_examples/scarletthead_woman/scarlett_0.jpg
diff --git a/assets/photomaker_examples/scarletthead_woman/scarlett_1.jpg b/assets/photomaker_examples/scarletthead_woman/scarlett_1.jpg
diff --git a/assets/photomaker_examples/scarletthead_woman/scarlett_2.jpg b/assets/photomaker_examples/scarletthead_woman/scarlett_2.jpg
diff --git a/assets/photomaker_examples/scarletthead_woman/scarlett_3.jpg b/assets/photomaker_examples/scarletthead_woman/scarlett_3.jpg
diff --git a/assets/photomaker_examples/yangmi_woman/yangmi_1.jpg b/assets/photomaker_examples/yangmi_woman/yangmi_1.jpg
diff --git a/assets/photomaker_examples/yangmi_woman/yangmi_2.jpeg b/assets/photomaker_examples/yangmi_woman/yangmi_2.jpeg
diff --git a/assets/photomaker_examples/yangmi_woman/yangmi_3.jpg b/assets/photomaker_examples/yangmi_woman/yangmi_3.jpg
diff --git a/assets/photomaker_examples/yangmi_woman/yangmi_4.jpg b/assets/photomaker_examples/yangmi_woman/yangmi_4.jpg
diff --git a/assets/photomaker_examples/yangmi_woman/yangmi_5.jpg b/assets/photomaker_examples/yangmi_woman/yangmi_5.jpg
diff --git a/assets/photomaker_examples/yangmi_woman/yangmi_6.jpg b/assets/photomaker_examples/yangmi_woman/yangmi_6.jpg