Skip to content

Commit

Permalink
1. add model zoo readme for clarification;
Browse files Browse the repository at this point in the history
2. update README.md;

Signed-off-by: lawrence-cj <cjs1020440147@icloud.com>
lawrence-cj committed Dec 13, 2024
1 parent 407f99a commit 94142e1
Showing 2 changed files with 81 additions and 3 deletions.
9 changes: 6 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -88,6 +88,7 @@ As a result, Sana-0.6B is very competitive with modern giant diffusion model (e.

- [Env](#-1-dependencies-and-installation)
- [Demo](#-3-how-to-inference)
- [Model Zoo](asset/docs/model_zoo.md)
- [Training](#-2-how-to-train)
- [Testing](#-4-how-to-inference--test-metrics-fid-clip-score-geneval-dpg-bench-etc)
- [TODO](#to-do-list)
@@ -126,7 +127,8 @@ python app/app_sana.py \

### 1. How to use `SanaPipeline` with `🧨diffusers`

run `pip install -U diffusers` before use Sana in diffusers
1. Run `pip install -U diffusers` before use Sana in diffusers
1. Make sure to use variant(bf16, fp16, fp32) and torch_dtype(torch.float16, torch.bfloat16, torch.float32) to specify the precision you want.

```python
import torch
@@ -311,8 +313,9 @@ We will try our best to release
# 🤗Acknowledgements

- Thanks to [PixArt-α](https://github.com/PixArt-alpha/PixArt-alpha), [PixArt-Σ](https://github.com/PixArt-alpha/PixArt-sigma),
[Efficient-ViT](https://github.com/mit-han-lab/efficientvit) and
[ComfyUI_ExtraModels](https://github.com/city96/ComfyUI_ExtraModels)
[Efficient-ViT](https://github.com/mit-han-lab/efficientvit),
[ComfyUI_ExtraModels](https://github.com/city96/ComfyUI_ExtraModels) and
[diffusers](https://github.com/huggingface/diffusers)
for their wonderful work and codebase!

# 📖BibTeX
75 changes: 75 additions & 0 deletions asset/docs/model_zoo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
## 🔥 1. We provide all the links of Sana pth and diffusers safetensor below

| Model | Reso | pth link | diffusers | Precision | Description |
|-----------|--------|---------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|---------------|----------------|
| Sana-0.6B | 512px | [Sana_600M_512px](https://huggingface.co/Efficient-Large-Model/Sana_600M_512px) | [Efficient-Large-Model/Sana_600M_512px_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_600M_512px_diffusers) | fp16/fp32 | Multi-Language |
| Sana-0.6B | 1024px | [Sana_600M_1024px](https://huggingface.co/Efficient-Large-Model/Sana_600M_1024px) | [Efficient-Large-Model/Sana_600M_1024px_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_600M_1024px_diffusers) | fp16/fp32 | Multi-Language |
| Sana-1.6B | 512px | [Sana_1600M_512px](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px) | [Efficient-Large-Model/Sana_1600M_512px_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px_diffusers) | fp16/fp32 | - |
| Sana-1.6B | 512px | [Sana_1600M_512px_MultiLing](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px_MultiLing) | [Efficient-Large-Model/Sana_1600M_512px_MultiLing_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px_MultiLing_diffusers) | fp16/fp32 | Multi-Language |
| Sana-1.6B | 1024px | [Sana_1600M_1024px](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px) | [Efficient-Large-Model/Sana_1600M_1024px_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_diffusers) | fp16/fp32 | - |
| Sana-1.6B | 1024px | [Sana_1600M_1024px_MultiLing](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_MultiLing) | [Efficient-Large-Model/Sana_1600M_1024px_MultiLing_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_MultiLing_diffusers) | fp16/fp32 | Multi-Language |
| Sana-1.6B | 1024px | [Sana_1600M_1024px_BF16](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_BF16) | [Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers) | **bf16**/fp32 | Multi-Language |

## ❗ 2. Make sure to use correct precision(fp16/bf16/fp32) for training and inference.

### We provide two samples to use fp16 and bf16 weights, respectively.

❗️Make sure to set `variant` and `torch_dtype` in diffusers pipelines to the desired precision.

#### 1). For fp16 models

```python
import torch
from diffusers import SanaPipeline

pipe = SanaPipeline.from_pretrained(
"Efficient-Large-Model/Sana_1600M_1024px_diffusers",
variant="fp16",
torch_dtype=torch.float16,
)
pipe.to("cuda")

pipe.vae.to(torch.bfloat16)
pipe.text_encoder.to(torch.bfloat16)

prompt = 'a cyberpunk cat with a neon sign that says "Sana"'
image = pipe(
prompt=prompt,
height=1024,
width=1024,
guidance_scale=5.0,
num_inference_steps=20,
generator=torch.Generator(device="cuda").manual_seed(42),
)[0]

image[0].save("sana.png")
```

#### 2). For bf16 models

```python
# run `pip install -U diffusers` before use Sana in diffusers
import torch
from diffusers import SanaPAGPipeline

pipe = SanaPAGPipeline.from_pretrained(
"Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers",
variant="bf16",
torch_dtype=torch.bfloat16,
pag_applied_layers="transformer_blocks.8"
)
pipe.to("cuda")

pipe.text_encoder.to(torch.bfloat16)
pipe.vae.to(torch.bfloat16)

prompt = 'a cyberpunk cat with a neon sign that says "Sana"'
image = pipe(
prompt=prompt,
guidance_scale=5.0,
pag_scale=2.0,
num_inference_steps=20,
generator=torch.Generator(device="cuda").manual_seed(42),
)[0]
image[0].save('sana.png')
```

0 comments on commit 94142e1

Please sign in to comment.