Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thanks! Works well on windows #13

Open
ai-anchorite opened this issue Dec 3, 2024 · 6 comments
Open

Thanks! Works well on windows #13

ai-anchorite opened this issue Dec 3, 2024 · 6 comments

Comments

@ai-anchorite
Copy link

Thanks for releasing the models and including a full-featured Gradio UI!

The included app.py works well on my 3090 with all 4 mem optimizations active. Generation takes ~20 minutes.

Installed using python 3.10 with torch 2.3.1+cu121

@SHYuanBest
Copy link
Member

Thanks for your feedback.

@SD-fan
Copy link

SD-fan commented Dec 4, 2024

@ai-anchorite Can't get it to run well under windows. Tried torch 2.3.1+cu121 and xformers 0.0.27.
Models get loaded into VRAM (~23.xGB) initially. Then the models get purged instantly and the 3090 is only used with ~5GB VRAM only. Inference takes forever.
Do you have the same experience? If not, would you mind sharing your conda environment (conda list).
THX!

@ai-anchorite
Copy link
Author

ai-anchorite commented Dec 5, 2024

That model loading behaviour and low vram during inference is consistent with CPU offload. Inference, including vae decode etc, consistently takes me 20 minutes.

less than 64GB RAM may push it into page file though?

I forked to make available to install on windows via PInokio here.

No real changes besides uncommenting the 4 optimisations, and removing HF Spaces.

installed with torch 2.3.1+cu121 and xformers 0.0.27 in python 3.10
removed deepspeed and spaces requirements.

@SD-fan
Copy link

SD-fan commented Dec 5, 2024

@ai-anchorite Thanks for your quick feedback. I was too impatient. Despite some console errors it runs well.
Thanks again!

@SD-fan
Copy link

SD-fan commented Dec 5, 2024

@ai-anchorite
I just found out that for running on a 3090 you can disable CPU offloading (#pipe.enable_sequential_cpu_offload()) - leaving all other mem optimizations enabled. Then ~20GB VRAM are used, the GPU runs at 100% (at all time) and generation only takes 10 min. 👍

Thanks @SHYuanBest for the great work!

@ai-anchorite
Copy link
Author

@ai-anchorite I just found out that for running on a 3090 you can disable CPU offloading (#pipe.enable_sequential_cpu_offload()) - leaving all other mem optimizations enabled. Then ~20GB VRAM are used, the GPU runs at 100% (at all time) and generation only takes 10 min. 👍

Thanks @SHYuanBest for the great work!

ah nice! i'd been meaning to find time to test that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants