Inference with lower VRAM requirements #18

frutiemax92 · 2024-11-21T16:59:04Z

Inference with bfloat16 and float16 works on a RTX4070. Float32 gives OOM.

nitinmukesh · 2024-11-22T06:06:29Z

Thank you for this PR.

tested on 8GB VRAM

Before

Inference time: 5-6m

After using your updates

Inference time: 2m

lawrence-cj · 2024-11-22T09:11:27Z

Thanks a lot for your work along the time

FurkanGozukara · 2024-11-22T11:41:50Z

this changed how batch size works

now it process 6 times if batch size is 6 and each iteration it becomes slower and slower

definitely something wrong - also vram not cleared after generations :) it uses entire vram after 6 iterations with batch size 6

lawrence-cj · 2024-11-22T11:50:55Z

The batch inference is not changed by this PR. The only thing this PR change is adding some lines of torch.no_grad(). SanaPipeline class in the current repo only supports batch inference for the same prompt:

Sana/app/sana_pipeline.py

Line 195 in c66ebf9

num_images_per_prompt=1,

If you input a prompts list, it will generate it separately.

FurkanGozukara · 2024-11-22T11:55:00Z

then it is probably because of the changes i had to make it to run on windows - i probably broken something because i am giving only 1 prompt via gradio

https://github.com/FurkanGozukara/Sana/blob/main/app/sana_pipeline.py

FurkanGozukara · 2024-11-22T11:56:14Z

um_images_per_pro

your code has this for batch size

            for _ in range(num_images_per_prompt):
                with torch.no_grad():
                    prompts.append(
                        prepare_prompt_ar(prompt, self.base_ratios, device=self.device, show=False)[0].strip()
                    )

lawrence-cj · 2024-11-22T12:04:51Z

Yes. This is only for generating multiple images with a single prompt. If you want to batch inference with input like: ['car', 'car', 'car'], then it's not supported for now.

frutiemax92 · 2024-11-22T15:21:23Z

then it is probably because of the changes i had to make it to run on windows - i probably broken something because i am giving only 1 prompt via gradio

https://github.com/FurkanGozukara/Sana/blob/main/app/sana_pipeline.py

I am using WSL2 and the sample script from the model's page on github.

yujincheng08 · 2024-11-25T15:51:23Z

This PR breaks batch inference. Fixed in 9da8550.

FurkanGozukara · 2024-11-25T16:06:42Z

This PR breaks batch inference. Fixed in 9da8550.

thanks i told batch was broken

Inference with lower VRAM requirements

2a43710

reformat

dec5433

lawrence-cj merged commit c66ebf9 into NVlabs:main Nov 22, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference with lower VRAM requirements #18

Inference with lower VRAM requirements #18

frutiemax92 commented Nov 21, 2024 •

edited

Loading

nitinmukesh commented Nov 22, 2024 •

edited

Loading

lawrence-cj commented Nov 22, 2024

FurkanGozukara commented Nov 22, 2024 •

edited

Loading

lawrence-cj commented Nov 22, 2024

FurkanGozukara commented Nov 22, 2024

FurkanGozukara commented Nov 22, 2024

lawrence-cj commented Nov 22, 2024

frutiemax92 commented Nov 22, 2024

yujincheng08 commented Nov 25, 2024

FurkanGozukara commented Nov 25, 2024

Inference with lower VRAM requirements #18

Inference with lower VRAM requirements #18

Conversation

frutiemax92 commented Nov 21, 2024 • edited Loading

nitinmukesh commented Nov 22, 2024 • edited Loading

lawrence-cj commented Nov 22, 2024

FurkanGozukara commented Nov 22, 2024 • edited Loading

lawrence-cj commented Nov 22, 2024

FurkanGozukara commented Nov 22, 2024

FurkanGozukara commented Nov 22, 2024

lawrence-cj commented Nov 22, 2024

frutiemax92 commented Nov 22, 2024

yujincheng08 commented Nov 25, 2024

FurkanGozukara commented Nov 25, 2024

frutiemax92 commented Nov 21, 2024 •

edited

Loading

nitinmukesh commented Nov 22, 2024 •

edited

Loading

FurkanGozukara commented Nov 22, 2024 •

edited

Loading