diff --git a/README.md b/README.md index ab1fe34..774dbaf 100644 --- a/README.md +++ b/README.md @@ -36,6 +36,7 @@ As a result, Sana-0.6B is very competitive with modern giant diffusion model (e. ## 🔥🔥 News +- (🔥 New) \[2024/12/13\] `diffusers` has Sana! [All Sana models in diffusers safetensors](https://huggingface.co/collections/Efficient-Large-Model/sana-673efba2a57ed99843f11f9e) are released and diffusers pipeline `SanaPipeline`, `SanaPAGPipeline`, `DPMSolverMultistepScheduler(with FlowMatching)` are all supported now. - (🔥 New) \[2024/12/10\] 1.6B BF16 [Sana model](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_BF16) is released for stable fine-tuning. - (🔥 New) \[2024/12/9\] We release the [ComfyUI node](https://github.com/Efficient-Large-Model/ComfyUI_ExtraModels) for Sana. [\[Guidance\]](asset/docs/ComfyUI/comfyui.md) - (🔥 New) \[2024/11\] All multi-linguistic (Emoji & Chinese & English) SFT models are released: [1.6B-512px](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px_MultiLing), [1.6B-1024px](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_MultiLing), [600M-512px](https://huggingface.co/Efficient-Large-Model/Sana_600M_512px), [600M-1024px](https://huggingface.co/Efficient-Large-Model/Sana_600M_1024px). The metric performance is shown [here](#performance) @@ -87,6 +88,7 @@ As a result, Sana-0.6B is very competitive with modern giant diffusion model (e. - [Env](#-1-dependencies-and-installation) - [Demo](#-3-how-to-inference) +- [Model Zoo](asset/docs/model_zoo.md) - [Training](#-2-how-to-train) - [Testing](#-4-how-to-inference--test-metrics-fid-clip-score-geneval-dpg-bench-etc) - [TODO](#to-do-list) @@ -112,6 +114,8 @@ cd Sana - 9GB VRAM is required for 0.6B model and 12GB VRAM for 1.6B model. Our later quantization version will require less than 8GB for inference. - All the tests are done on A100 GPUs. Different GPU version may be different. +## 🔛 Choose your model: [Model card](asset/docs/model_zoo.md) + ## 🔛 Quick start with [Gradio](https://www.gradio.app/guides/quickstart) ```bash @@ -123,6 +127,70 @@ python app/app_sana.py \ --model_path=hf://Efficient-Large-Model/Sana_1600M_1024px/checkpoints/Sana_1600M_1024px.pth ``` +### 1. How to use `SanaPipeline` with `🧨diffusers` + +1. Run `pip install -U diffusers` before use Sana in diffusers +1. Make sure to use variant(bf16, fp16, fp32) and torch_dtype(torch.float16, torch.bfloat16, torch.float32) to specify the precision you want. + +```python +import torch +from diffusers import SanaPipeline + +pipe = SanaPipeline.from_pretrained( + "Efficient-Large-Model/Sana_1600M_1024px_diffusers", + variant="fp16", + torch_dtype=torch.float16, +) +pipe.to("cuda") + +pipe.vae.to(torch.bfloat16) +pipe.text_encoder.to(torch.bfloat16) + +prompt = 'a cyberpunk cat with a neon sign that says "Sana"' +image = pipe( + prompt=prompt, + height=1024, + width=1024, + guidance_scale=5.0, + num_inference_steps=20, + generator=torch.Generator(device="cuda").manual_seed(42), +)[0] + +image[0].save("sana.png") +``` + +### 2. How to use `SanaPAGPipeline` with `🧨diffusers` + +```python +# run `pip install -U diffusers` before use Sana in diffusers +import torch +from diffusers import SanaPAGPipeline + +pipe = SanaPAGPipeline.from_pretrained( + "Efficient-Large-Model/Sana_1600M_1024px_diffusers", + variant="fp16", + torch_dtype=torch.float16, + pag_applied_layers="transformer_blocks.8", +) +pipe.to("cuda") + +pipe.text_encoder.to(torch.bfloat16) +pipe.vae.to(torch.bfloat16) + +prompt = 'a cyberpunk cat with a neon sign that says "Sana"' +image = pipe( + prompt=prompt, + guidance_scale=5.0, + pag_scale=2.0, + num_inference_steps=20, + generator=torch.Generator(device="cuda").manual_seed(42), +)[0] +image[0].save('sana.png') +``` + +
+

3. How to use Sana in this repo

+ ```python import torch from app.sana_pipeline import SanaPipeline @@ -147,8 +215,10 @@ image = sana( save_image(image, 'output/sana.png', nrow=1, normalize=True, value_range=(-1, 1)) ``` +
+
-

Run Sana (Inference) with Docker

+

4. Run Sana (Inference) with Docker

``` # Pull related models @@ -245,8 +315,9 @@ We will try our best to release # 🤗Acknowledgements - Thanks to [PixArt-α](https://github.com/PixArt-alpha/PixArt-alpha), [PixArt-Σ](https://github.com/PixArt-alpha/PixArt-sigma), - [Efficient-ViT](https://github.com/mit-han-lab/efficientvit) and - [ComfyUI_ExtraModels](https://github.com/city96/ComfyUI_ExtraModels) + [Efficient-ViT](https://github.com/mit-han-lab/efficientvit), + [ComfyUI_ExtraModels](https://github.com/city96/ComfyUI_ExtraModels) and + [diffusers](https://github.com/huggingface/diffusers) for their wonderful work and codebase! # 📖BibTeX diff --git a/asset/docs/ComfyUI/comfyui.md b/asset/docs/ComfyUI/comfyui.md index c4bbf68..5c6486b 100644 --- a/asset/docs/ComfyUI/comfyui.md +++ b/asset/docs/ComfyUI/comfyui.md @@ -12,7 +12,6 @@ 1. All the checkpoints will be downloaded automatically. 1. KSampler(Flow Euler) is available for now; Flow DPM-Solver will be available soon. -1. For more information, check the [original city96/ComfyUI_ExtraModels](https://github.com/city96/ComfyUI_ExtraModels). ```bash git clone https://github.com/comfyanonymous/ComfyUI.git diff --git a/asset/docs/model_zoo.md b/asset/docs/model_zoo.md new file mode 100644 index 0000000..7368c1e --- /dev/null +++ b/asset/docs/model_zoo.md @@ -0,0 +1,75 @@ +## 🔥 1. We provide all the links of Sana pth and diffusers safetensor below + +| Model | Reso | pth link | diffusers | Precision | Description | +|-----------|--------|---------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|---------------|----------------| +| Sana-0.6B | 512px | [Sana_600M_512px](https://huggingface.co/Efficient-Large-Model/Sana_600M_512px) | [Efficient-Large-Model/Sana_600M_512px_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_600M_512px_diffusers) | fp16/fp32 | Multi-Language | +| Sana-0.6B | 1024px | [Sana_600M_1024px](https://huggingface.co/Efficient-Large-Model/Sana_600M_1024px) | [Efficient-Large-Model/Sana_600M_1024px_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_600M_1024px_diffusers) | fp16/fp32 | Multi-Language | +| Sana-1.6B | 512px | [Sana_1600M_512px](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px) | [Efficient-Large-Model/Sana_1600M_512px_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px_diffusers) | fp16/fp32 | - | +| Sana-1.6B | 512px | [Sana_1600M_512px_MultiLing](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px_MultiLing) | [Efficient-Large-Model/Sana_1600M_512px_MultiLing_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px_MultiLing_diffusers) | fp16/fp32 | Multi-Language | +| Sana-1.6B | 1024px | [Sana_1600M_1024px](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px) | [Efficient-Large-Model/Sana_1600M_1024px_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_diffusers) | fp16/fp32 | - | +| Sana-1.6B | 1024px | [Sana_1600M_1024px_MultiLing](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_MultiLing) | [Efficient-Large-Model/Sana_1600M_1024px_MultiLing_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_MultiLing_diffusers) | fp16/fp32 | Multi-Language | +| Sana-1.6B | 1024px | [Sana_1600M_1024px_BF16](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_BF16) | [Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers) | **bf16**/fp32 | Multi-Language | + +## ❗ 2. Make sure to use correct precision(fp16/bf16/fp32) for training and inference. + +### We provide two samples to use fp16 and bf16 weights, respectively. + +❗️Make sure to set `variant` and `torch_dtype` in diffusers pipelines to the desired precision. + +#### 1). For fp16 models + +```python +import torch +from diffusers import SanaPipeline + +pipe = SanaPipeline.from_pretrained( + "Efficient-Large-Model/Sana_1600M_1024px_diffusers", + variant="fp16", + torch_dtype=torch.float16, +) +pipe.to("cuda") + +pipe.vae.to(torch.bfloat16) +pipe.text_encoder.to(torch.bfloat16) + +prompt = 'a cyberpunk cat with a neon sign that says "Sana"' +image = pipe( + prompt=prompt, + height=1024, + width=1024, + guidance_scale=5.0, + num_inference_steps=20, + generator=torch.Generator(device="cuda").manual_seed(42), +)[0] + +image[0].save("sana.png") +``` + +#### 2). For bf16 models + +```python +# run `pip install -U diffusers` before use Sana in diffusers +import torch +from diffusers import SanaPAGPipeline + +pipe = SanaPAGPipeline.from_pretrained( + "Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", + variant="bf16", + torch_dtype=torch.bfloat16, + pag_applied_layers="transformer_blocks.8", +) +pipe.to("cuda") + +pipe.text_encoder.to(torch.bfloat16) +pipe.vae.to(torch.bfloat16) + +prompt = 'a cyberpunk cat with a neon sign that says "Sana"' +image = pipe( + prompt=prompt, + guidance_scale=5.0, + pag_scale=2.0, + num_inference_steps=20, + generator=torch.Generator(device="cuda").manual_seed(42), +)[0] +image[0].save('sana.png') +```