Skip to content

Latest commit

 

History

History
 
 

ExVideo

ExVideo

ExVideo is a post-tuning technique aimed at enhancing the capability of video generation models. We have extended Stable Video Diffusion to achieve the generation of long videos up to 128 frames.

Example: Text-to-video via extended CogVideoX-5B

Generate a video using CogVideoX-5B and our extension module. See ExVideo_cogvideox_test.py.

demo.mp4

Example: Text-to-video via extended Stable Video Diffusion

Generate a video using a text-to-image model and our image-to-video model. See ExVideo_svd_test.py.

github_title.mp4

Train

  • Step 1: Install additional packages
pip install lightning deepspeed
  • Step 2: Download base model (from HuggingFace or ModelScope) to models/stable_video_diffusion/svd_xt.safetensors.

  • Step 3: Prepare datasets

path/to/your/dataset
├── metadata.json
└── videos
    ├── video_1.mp4
    ├── video_2.mp4
    └── video_3.mp4

where the metadata.json is

[
    {
        "path": "videos/video_1.mp4"
    },
    {
        "path": "videos/video_2.mp4"
    },
    {
        "path": "videos/video_3.mp4"
    }
]
  • Step 4: Run
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" python -u ExVideo_svd_train.py \
  --pretrained_path "models/stable_video_diffusion/svd_xt.safetensors" \
  --dataset_path "path/to/your/dataset" \
  --output_path "path/to/save/models" \
  --steps_per_epoch 8000 \
  --num_frames 128 \
  --height 512 \
  --width 512 \
  --dataloader_num_workers 2 \
  --learning_rate 1e-5 \
  --max_epochs 100
  • Step 5: Post-process checkpoints

Calculate Exponential Moving Average (EMA) and package it using safetensors.

python ExVideo_ema.py --output_path "path/to/save/models/lightning_logs/version_xx" --gamma 0.9
  • Step 6: Enjoy your model

The EMA model is at path/to/save/models/lightning_logs/version_xx/checkpoints/epoch=xx-step=yyy-ema.safetensors. Load it in ExVideo_svd_test.py and then enjoy your model.