-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Approximate release date of MULTI-GPU inferencing #11
Comments
Thanks for your questions! The multi gpu inference will be supported next week. And the corresponding inference time on A100 (1~8) will be recorded. |
Great! So it will be released today or tomorrow? I see you guys have added an open source plan, which is awesome by the way. But there is no official release date for the multi gpu inference. Thanks again @hejingwenhejingwen |
It is released now. |
Awesome @hejingwenhejingwen Will you guys be releasing the corresponding inference time on A100 (1~8). Right now it still seems kind of slow. We tried with 8 A100(80gb) and takes 56 to 62gb memory per gpu and roughly 45mins to enhance a 6 second long CogVideoX video. I am running the multi_gpu_inference.sh is there anything else we need to do? |
My apologies🙃 There was a bug in the previous version where the non-parallel model implementation is always used even during multi-gpu inference. We have now fixed the bug, could you please pull the latest version and try again with the run_VEnhancer_MultiGPU.sh script? |
@ChrisLiu6 I tried with 8 and 4 A100(40gb) cpus and now I get this error:
I did this: and then install xformers: for 4 gpus it just prints this out many times and doesn't seem to be stopping:
|
If it continuously prints the stuff, it is working as expected. For the 8 gpu case the error is likely due to too few frames to be parallel-processed by too many gpus, I'm looking into it. |
@ChrisLiu6 thank you for your response. Have you fixed the 8 gpu inferencing in the latest pushes or are you still working on it? |
@SamitM1 This means that when using 8 gpus to process 25 frames, one gpu is destined to be idle. Therefore, as a walkaround, you can use 7 GPUs. We will soon add the logics that for cases like yours we will automatically use less GPUs. By the way, does the 4-GPU run work fine? Thx |
@ChrisLiu6 The thing is we want to use more/8 GPUs (assuming it would lead to faster inferencing) so based on what you said above does that mean we have to pass in a video with a total number of frames that is divisible by 8 like for example 48 frames(which would mean 6 frames per GPU). Basically is there anyway for us run the following input with 8 GPUS:
Thanks in advance! edit: 7 gpus does not work:
|
@ChrisLiu6 |
Hi, I've just pushed a commit, and now you should be able to use any number of GPUs without error (however, in some cases some of the GPUs may be idle). Generally speaking, using 4 or 8 GPUs can be a good choice in most cases. |
@ChrisLiu6
why is it taking up more GPU memory right as it finished? How can I fix this? I am assuming i have to add "torch.cuda.empty_cache()" but we are unsure where. Thanks for all your help! |
The OOM comes from temporal VAE decoding, its parallel inference is not supported now. But you can reduce the memory by modifying the tile size through VEnhancer/video_to_video/video_to_video_model.py line172~174 |
@hejingwenhejingwen would love your recommendation for how much to reduce tile size for our particular case as we have a 40gb memory per gpu. |
Please try self.frame_chunk_size = 3 self.tile_img_height = 576 self.tile_img_width = 768. |
@hejingwenhejingwen I tried everything:
but it still failed:
|
You can make chunks of all latents before you pass them to the VAE decoder. After you finish one chunk, put them to cpu. |
Hey guys great work with this. We were wondering if and (approximately) when you will be releasing the multi gpu inferencing. Furthermore what is the time taken with default settings to inference a 6 second long CogVideoX generated video if using a H100 (which more powerful and efficient than A100). Furthermore if and once multi gpu inferencing is implemented approximately how long would it take to inference a CogVideoX generated video with 8 A100 or with 8 H100 gpus?
thanks in advance
The text was updated successfully, but these errors were encountered: