Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions and confirmation information #1

Open
zRzRzRzRzRzRzR opened this issue Sep 15, 2024 · 7 comments
Open

Some questions and confirmation information #1

zRzRzRzRzRzRzR opened this issue Sep 15, 2024 · 7 comments

Comments

@zRzRzRzRzRzRzR
Copy link

zRzRzRzRzRzRzR commented Sep 15, 2024

Dear Development Team,

Hello, I have successfully installed the model and run it according to the requirements in the README, but I have encountered some issues and look forward to your response.

  1. The repository does not seem to provide example prompt requirements, such as the language and how to structure the length properly. By reading the code, I gathered the following information:
  • No negative_prompt
  • The length of the input prompt should be < 77 Tokens (CLIP)
  • The input must be in English.

In this case, I am unsure how to structure the prompt, so I simply wrote a prompt:

A little girl is riding a bicycle at high speed. Focused, detailed, realistic.

and set the seed to 42:

def set_seed(seed):
    random.seed(42)
    os.environ['PYTHONHASHSEED'] = str(42)
    np.random.seed(42)
    torch.manual_seed(42)
    torch.cuda.manual_seed(42)

I set the output to 720x480 according to the README, and configured it as follows:

                        prompt,
                        negative_prompt="",
                        num_inference_steps=50,
                        guidance_scale=7.5,
                        width=768,
                        height=432, #480x288  624x352 432x240 768x432
                        frames=8*20,
                    )

It occupied 67904MiB of GPU memory. The other parameters remained unchanged, with 50 sampling steps. The final video can be found here:

sample_1_seed0.mp4

Is this the expected result?

  1. I did not see any relevant details about I2V in the code, nor any place where an image can be used as input. Should I understand that this open-source model is a T2V model?

  2. It seems that there is no parameter to control the frame rate.

  • According to the promotion, the model can generate videos at 24 frames per second.
    image

However, the video I generated only has 8 frames, with a total of 40 frames, as verified using the following command:

ffprobe -v 0 -of csv=p=0 -select_streams v:0 -show_entries stream=r_frame_rate sample_1_seed0.mp4

8/1
  • I tried adjusting the frame rate based on the parameter I saw. If I generate a 20-second video on a single GPU with a frame rate of 24, it results in an out-of-memory (OOM) error on a single A100 GPU, even with the following settings:

image

Is it because the open-source model only outputs 8 frames?

Additionally, there may be some issues in the code within the repository:

  1. device = None,

    This should be modified to device = "cuda", or add device = "cuda" in:
    pipe = VchitectXLPipeline(args.ckpt_path)

    Otherwise, a tensor not on the same device error will occur during pos embed.

  2. https://github.com/Vchitect/Vchitect-2.0/tree/master/models/__pycache__
    Should this be deleted? It seems unnecessary.

Looking forward to your response.

@zRzRzRzRzRzRzR zRzRzRzRzRzRzR changed the title 详细说明指导 Some questions and confirmation information Sep 15, 2024
@foreverpiano
Copy link

同问,目前不好复现,

@WeichenFan
Copy link
Collaborator

Hi ZR,

Thanks for your interest!

The repository does not seem to provide example prompt requirements, such as the language and how to structure the length properly. By reading the code, I gathered the following information:
No negative_prompt
The length of the input prompt should be < 77 Tokens (CLIP)
The input must be in English.

  • We will add more info to README ASAP.
    • negative prompt is supported;
    • The length of the input prompt can be greater than 77 tokens (T5 can accept longer prompts, CLIP cannot but is fine);
    • Yes, the input must be English;

Regarding the video length, we apologize that the current open-source version only supports videos shorter than 10 seconds. The timeline for open-sourcing additional versions, including the I2V model, is still undecided.

@fahdmirza
Copy link

@zRzRzRzRzRzRzR could you kindly share your full steps from start to end as how you created this video? How did you download checkpoints , what library versions did you use etc as I am trying to follow their READ ME but its incomplete. I am getting following error:

from models.VchitectXL import VchitectXLTransformerModel
File "/home/Ubuntu/Vchitect-2.0/models/VchitectXL.py", line 34, in
from torch.distributed.tensor.parallel import (
ImportError: cannot import name 'PrepareModuleOutput' from 'torch.distributed.tensor.parallel' (/home/Ubuntu/miniconda3/envs/VchitectXL/lib/python3.11/site-packages/torch/distributed/tensor/parallel/init.py)

@fakerybakery
Copy link

Regarding the video length, we apologize that the current open-source version only supports videos shorter than 10 seconds. The timeline for open-sourcing additional versions, including the I2V model, is still undecided.

Hi,
Just to clarify, the examples from the webpage were generated using a different model than the open-source model?
Thanks

@zRzRzRzRzRzRzR
Copy link
Author

@zRzRzRzRzRzRzR could you kindly share your full steps from start to end as how you created this video? How did you download checkpoints , what library versions did you use etc as I am trying to follow their READ ME but its incomplete. I am getting following error:您能否友好地分享您从头到尾创建这个视频的完整步骤?您是如何下载检查点的,使用了哪些库版本等,因为我正在尝试按照他们的 README 操作,但它是残缺的。我遇到了以下错误:

from models.VchitectXL import VchitectXLTransformerModel从 models.VchitectXL 导入 VchitectXLTransformerModel File "/home/Ubuntu/Vchitect-2.0/models/VchitectXL.py", line 34, in 文件 "/home/Ubuntu/Vchitect-2.0/models/VchitectXL.py",第 34 行,在 from torch.distributed.tensor.parallel import (从 torch.distributed.tensor.parallel 导入 ( ImportError: cannot import name 'PrepareModuleOutput' from 'torch.distributed.tensor.parallel' (/home/Ubuntu/miniconda3/envs/VchitectXL/lib/python3.11/site-packages/torch/distributed/tensor/parallel/init.py)ImportError: 无法从 'torch.distributed.tensor.parallel' 导入名称 'PrepareModuleOutput' (/home/Ubuntu/miniconda3/envs/VchitectXL/lib/python3.11/site-packages/torch/distributed/tensor/parallel/init.py)

I wll upload later

@zRzRzRzRzRzRzR
Copy link
Author

Step 1
test.txt has only one line

A little girl is riding a bicycle at high speed. Focused, detailed, realistic.

Step 2, modify the code inference.py:

def infer(args):
pipe = VchitectXLPipeline(args.ckpt_path)
idx = 0

Change to

def infer(args):
pipe = VchitectXLPipeline(args.ckpt_path,device="cuda")
idx = 0

Step 3, if you want to change the number of frames of the video generation length:

with torch.cuda.amp.autocast(dtype=torch.bfloat16):
video = pipe(
prompt,
negative_prompt="",
num_inference_steps=50,
guidance_scale=7.5,
width=768,
height=432, #480x288 624x352 432x240 768x432
frames=10*8, #Change here, seconds*frames (default is 8 frames)
)

Step 4, run the program:

CUDA_VISIBLE_DEVICES=8 python inference.py --test_file test.txt --save_dir output --ckpt_path Vchitect-XL-2B (the absolute path of the model you downloaded)

This will run, I believe it will help you.

@zRzRzRzRzRzRzR
Copy link
Author

Hi ZR,

Thanks for your interest!

The repository does not seem to provide example prompt requirements, such as the language and how to structure the length properly. By reading the code, I gathered the following information:
No negative_prompt
The length of the input prompt should be < 77 Tokens (CLIP)
The input must be in English.

* We will add more info to README ASAP.
  
  * negative prompt is supported;
  * The length of the input prompt can be greater than 77 tokens (T5 can accept longer prompts, CLIP cannot but is fine);
  * Yes, the input must be English;

Regarding the video length, we apologize that the current open-source version only supports videos shorter than 10 seconds. The timeline for open-sourcing additional versions, including the I2V model, is still undecided.
How can I change the frame rate? The video currently generates at 8 frames per second.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants