Thanks! Works well on windows #13

ai-anchorite · 2024-12-03T08:02:57Z

Thanks for releasing the models and including a full-featured Gradio UI!

The included app.py works well on my 3090 with all 4 mem optimizations active. Generation takes ~20 minutes.

Installed using python 3.10 with torch 2.3.1+cu121

The text was updated successfully, but these errors were encountered:

SHYuanBest · 2024-12-03T09:48:38Z

Thanks for your feedback.

SD-fan · 2024-12-04T18:00:11Z

@ai-anchorite Can't get it to run well under windows. Tried torch 2.3.1+cu121 and xformers 0.0.27.
Models get loaded into VRAM (~23.xGB) initially. Then the models get purged instantly and the 3090 is only used with ~5GB VRAM only. Inference takes forever.
Do you have the same experience? If not, would you mind sharing your conda environment (conda list).
THX!

ai-anchorite · 2024-12-05T02:10:54Z

That model loading behaviour and low vram during inference is consistent with CPU offload. Inference, including vae decode etc, consistently takes me 20 minutes.

less than 64GB RAM may push it into page file though?

I forked to make available to install on windows via PInokio here.

No real changes besides uncommenting the 4 optimisations, and removing HF Spaces.

installed with torch 2.3.1+cu121 and xformers 0.0.27 in python 3.10
removed deepspeed and spaces requirements.

SD-fan · 2024-12-05T09:57:43Z

@ai-anchorite Thanks for your quick feedback. I was too impatient. Despite some console errors it runs well.
Thanks again!

SD-fan · 2024-12-05T10:35:43Z

@ai-anchorite
I just found out that for running on a 3090 you can disable CPU offloading (#pipe.enable_sequential_cpu_offload()) - leaving all other mem optimizations enabled. Then ~20GB VRAM are used, the GPU runs at 100% (at all time) and generation only takes 10 min. 👍

Thanks @SHYuanBest for the great work!

ai-anchorite · 2024-12-05T10:49:07Z

@ai-anchorite I just found out that for running on a 3090 you can disable CPU offloading (#pipe.enable_sequential_cpu_offload()) - leaving all other mem optimizations enabled. Then ~20GB VRAM are used, the GPU runs at 100% (at all time) and generation only takes 10 min. 👍

Thanks @SHYuanBest for the great work!

ah nice! i'd been meaning to find time to test that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thanks! Works well on windows #13

Thanks! Works well on windows #13

ai-anchorite commented Dec 3, 2024

SHYuanBest commented Dec 3, 2024

SD-fan commented Dec 4, 2024

ai-anchorite commented Dec 5, 2024 •

edited

Loading

SD-fan commented Dec 5, 2024

SD-fan commented Dec 5, 2024

ai-anchorite commented Dec 5, 2024

Thanks! Works well on windows #13

Thanks! Works well on windows #13

Comments

ai-anchorite commented Dec 3, 2024

SHYuanBest commented Dec 3, 2024

SD-fan commented Dec 4, 2024

ai-anchorite commented Dec 5, 2024 • edited Loading

SD-fan commented Dec 5, 2024

SD-fan commented Dec 5, 2024

ai-anchorite commented Dec 5, 2024

ai-anchorite commented Dec 5, 2024 •

edited

Loading