1. add model zoo readme for clarification;

2. update README.md; Signed-off-by: lawrence-cj <cjs1020440147@icloud.com>
NVlabs · Dec 13, 2024 · 94142e1 · 94142e1
1 parent 407f99a
commit 94142e1
Showing 2 changed files with 81 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -88,6 +88,7 @@ As a result, Sana-0.6B is very competitive with modern giant diffusion model (e.
 
 - [Env](#-1-dependencies-and-installation)
 - [Demo](#-3-how-to-inference)
+- [Model Zoo](asset/docs/model_zoo.md)
 - [Training](#-2-how-to-train)
 - [Testing](#-4-how-to-inference--test-metrics-fid-clip-score-geneval-dpg-bench-etc)
 - [TODO](#to-do-list)
@@ -126,7 +127,8 @@ python app/app_sana.py \
 
 ### 1. How to use `SanaPipeline` with `🧨diffusers`
 
-run `pip install -U diffusers` before use Sana in diffusers
+1. Run `pip install -U diffusers` before use Sana in diffusers
+1. Make sure to use variant(bf16, fp16, fp32) and torch_dtype(torch.float16, torch.bfloat16, torch.float32) to specify the precision you want.
 
 ```python
 import torch
@@ -311,8 +313,9 @@ We will try our best to release
 # 🤗Acknowledgements
 
 - Thanks to [PixArt-α](https://github.com/PixArt-alpha/PixArt-alpha), [PixArt-Σ](https://github.com/PixArt-alpha/PixArt-sigma),
-  [Efficient-ViT](https://github.com/mit-han-lab/efficientvit) and
-  [ComfyUI_ExtraModels](https://github.com/city96/ComfyUI_ExtraModels)
+  [Efficient-ViT](https://github.com/mit-han-lab/efficientvit),
+  [ComfyUI_ExtraModels](https://github.com/city96/ComfyUI_ExtraModels) and
+  [diffusers](https://github.com/huggingface/diffusers)
   for their wonderful work and codebase!
 
 # 📖BibTeX

diff --git a/asset/docs/model_zoo.md b/asset/docs/model_zoo.md
@@ -0,0 +1,75 @@
+## 🔥 1. We provide all the links of Sana pth and diffusers safetensor below
+
+| Model     | Reso   | pth link                                                                                                | diffusers                                                                                                                                         | Precision     | Description    |
+|-----------|--------|---------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|---------------|----------------|
+| Sana-0.6B | 512px  | [Sana_600M_512px](https://huggingface.co/Efficient-Large-Model/Sana_600M_512px)                         | [Efficient-Large-Model/Sana_600M_512px_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_600M_512px_diffusers)                         | fp16/fp32     | Multi-Language |
+| Sana-0.6B | 1024px | [Sana_600M_1024px](https://huggingface.co/Efficient-Large-Model/Sana_600M_1024px)                       | [Efficient-Large-Model/Sana_600M_1024px_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_600M_1024px_diffusers)                       | fp16/fp32     | Multi-Language |
+| Sana-1.6B | 512px  | [Sana_1600M_512px](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px)                       | [Efficient-Large-Model/Sana_1600M_512px_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px_diffusers)                       | fp16/fp32     | -              |
+| Sana-1.6B | 512px  | [Sana_1600M_512px_MultiLing](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px_MultiLing)   | [Efficient-Large-Model/Sana_1600M_512px_MultiLing_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px_MultiLing_diffusers)   | fp16/fp32     | Multi-Language |
+| Sana-1.6B | 1024px | [Sana_1600M_1024px](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px)                     | [Efficient-Large-Model/Sana_1600M_1024px_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_diffusers)                     | fp16/fp32     | -              |
+| Sana-1.6B | 1024px | [Sana_1600M_1024px_MultiLing](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_MultiLing) | [Efficient-Large-Model/Sana_1600M_1024px_MultiLing_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_MultiLing_diffusers) | fp16/fp32     | Multi-Language |
+| Sana-1.6B | 1024px | [Sana_1600M_1024px_BF16](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_BF16)           | [Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers)           | **bf16**/fp32 | Multi-Language |
+
+## ❗ 2. Make sure to use correct precision(fp16/bf16/fp32) for training and inference.
+
+### We provide two samples to use fp16 and bf16 weights, respectively.
+
+❗️Make sure to set `variant` and `torch_dtype` in diffusers pipelines to the desired precision.
+
+#### 1). For fp16 models
+
+```python
+import torch
+from diffusers import SanaPipeline
+
+pipe = SanaPipeline.from_pretrained(
+    "Efficient-Large-Model/Sana_1600M_1024px_diffusers",
+    variant="fp16",
+    torch_dtype=torch.float16,
+)
+pipe.to("cuda")
+
+pipe.vae.to(torch.bfloat16)
+pipe.text_encoder.to(torch.bfloat16)
+
+prompt = 'a cyberpunk cat with a neon sign that says "Sana"'
+image = pipe(
+    prompt=prompt,
+    height=1024,
+    width=1024,
+    guidance_scale=5.0,
+    num_inference_steps=20,
+    generator=torch.Generator(device="cuda").manual_seed(42),
+)[0]
+
+image[0].save("sana.png")
+```
+
+#### 2). For bf16 models
+
+```python
+# run `pip install -U diffusers` before use Sana in diffusers
+import torch
+from diffusers import SanaPAGPipeline
+
+pipe = SanaPAGPipeline.from_pretrained(
+  "Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers",
+  variant="bf16",
+  torch_dtype=torch.bfloat16,
+  pag_applied_layers="transformer_blocks.8"
+)
+pipe.to("cuda")
+
+pipe.text_encoder.to(torch.bfloat16)
+pipe.vae.to(torch.bfloat16)
+
+prompt = 'a cyberpunk cat with a neon sign that says "Sana"'
+image = pipe(
+    prompt=prompt,
+    guidance_scale=5.0,
+    pag_scale=2.0,
+    num_inference_steps=20,
+    generator=torch.Generator(device="cuda").manual_seed(42),
+)[0]
+image[0].save('sana.png')
+```