GitHub - xiefan-guo/i4vgen: [arXiv 2024] I4VGen: Image as Free Stepping Stone for Text-to-Video Generation

I4VGen: Image as Free Stepping Stone for Text-to-Video Generation
_{Official PyTorch implementation of the arXiv 2024 paper: https://arxiv.org/abs/2406.02230}

I4VGen: Image as Free Stepping Stone for Text-to-Video Generation
Xiefan Guo, Jinlin Liu, Miaomiao Cui, Liefeng Bo, Di Huang
https://xiefan-guo.github.io/i4vgen

Abstract: I4VGen is a novel video diffusion inference pipeline to leverage advanced image techniques to enhance pre-trained text-to-video diffusion models, which requires no additional training. Instead of the vanilla text-to-video inference pipeline, I4VGen consists of two stages: anchor image synthesis and anchor image-augmented text-to-video synthesis. Correspondingly, a simple yet effective generation-selection strategy is employed to achieve visually-realistic and semantically-faithful anchor image, and an innovative noise-invariant video score distillation sampling (NI-VSDS) is developed to animate the image to a dynamic video by distilling motion knowledge from video diffusion models, followed by a video regeneration process to refine the video. Extensive experiments show that the proposed method produces videos with higher visual realism and textual fidelity.

Requirements

Linux and Windows are supported, but we recommend Linux for performance and compatibility reasons.
All experiments are conducted on a single NVIDIA V100 GPU (32 GB).

AnimateDiff

Python libraries: See environment.yml for exact library dependencies. You can use the following commands to create and activate your AnimateDiff Python environment:

# Create conda environment
conda env create -f environments/animatediff_environment.yaml
# Activate conda environment
conda activate animatediff_env

Inference setup: Please refer to the official repo of AnimateDiff. The setup guide is listed here. mm-sd-v15-v2 and stable-diffusion-v1-5 are used in our experiments.

Name	HuggingFace	Type
mm-sd-v15-v2	Link	Motion module
stable-diffusion-v1-5	Link	Base T2I diffusion model

Generating videos: Before generating the video, please make sure you have set up the required Python environment and downloaded the corresponding checkpoints. Run the following command to generate the video.

python -m scripts.animate_animatediff --config configs/animatediff_configs/i4vgen_animatediff.yaml

In configs/animatediff_configs/i4vgen_animatediff.yaml and ArgumentParser, arguments for inference:

motion_module: path to motion module, i.e., mm-sd-v15-v2 motion module
pretrained_model_path: path to base T2I diffusion model, i.e., stable-diffusion-v1-5

LaVie

Python libraries: See environment.yml for exact library dependencies. You can use the following commands to create and activate your LaVie Python environment:

# Create LaVie conda environment
conda env create -f environments/lavie_environment.yaml
# Activate LaVie conda environment
conda activate lavie_env

Inference setup: Please refer to the official repo of LaVie. The base-version is employed in our experiments. Download pre-trained lavie_base and stable-diffusion-v1-4.

Name	HuggingFace	Type
lavie_base	Link	LaVie model
stable-diffusion-v1-4	Link	Base T2I diffusion model

Generating videos: Before generating the video, please make sure you have set up the required Python environment and downloaded the corresponding checkpoints. Run the following command to generate the video.

python scripts/animate_lavie.py --config configs/lavie_configs/i4vgen_lavie.yaml

In configs/lavie_configs/i4vgen_lavie.yaml and ArgumentParser, arguments for inference:

ckpt_path: path to LaVie model, i.e., lavie_base
sd_path: path to base T2I diffusion model, i.e., stable-diffusion-v1-4

Citation

@article{guo2024i4vgen,
    title   = {I4VGen: Image as Free Stepping Stone for Text-to-Video Generation},
    author  = {Guo, Xiefan and Liu, Jinlin and Cui, Miaomiao and Bo, Liefeng and Huang, Di},
    journal = {arXiv preprint arXiv:2406.02230},
    year    = {2024}
}

Acknowledgments

The code is built upon AnimateDiff and LaVie, we thank all the contributors for open-sourcing.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
docs		docs
environments		environments
i4vgen		i4vgen
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

I4VGen: Image as Free Stepping Stone for Text-to-Video Generation
_{Official PyTorch implementation of the arXiv 2024 paper: https://arxiv.org/abs/2406.02230}

Requirements

AnimateDiff

LaVie

Citation

Acknowledgments

About

Releases

Packages

Languages

License

xiefan-guo/i4vgen

Folders and files

Latest commit

History

Repository files navigation

I4VGen: Image as Free Stepping Stone for Text-to-Video GenerationOfficial PyTorch implementation of the arXiv 2024 paper: https://arxiv.org/abs/2406.02230

Requirements

AnimateDiff

LaVie

Citation

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

I4VGen: Image as Free Stepping Stone for Text-to-Video Generation
_{Official PyTorch implementation of the arXiv 2024 paper: https://arxiv.org/abs/2406.02230}

Packages