Skip to content

OpenSparseLLMs/Open-Pandora

Repository files navigation

Open-Pandora: An Open World video generation model

Based on the maitrix-org/Pandora project on GitHub, we have open-sourced the training code and models for the Pandora project. The training process includes two main stages: alignment and finetuning. Additionally, we have released the latest Pandora model weights, which were trained for 60w steps on the Webvid dataset.

Demo

You can control the model in real-time using text, currently supporting 5 rounds of autoregressive prediction to generate 10-second videos. Alternatively, you can generate a single video with the following effects:

Results with a resolution of 320×512.

2s 320×512 2s 320×512 2s 320×512
Wind flows the leaves. The red car moves along the path. Green hills of tuscany, italy, time-lapse.
The car moves forward. A bonfire is lit in the middle of a field. Pouring honey onto some slices of bread.

Results with a resolution of 576×1024.

2s 576×1024 2s 576×1024
A sailboat sailing in rough seas with a dramatic sunset Two young women studying in a library.
A brown and white cow eating hay. A bald eagle flying over a tree filled forest.
Two eggs are fried in a frying pan on the stove. A boat sits on the shore of a lake with mt fuji in the background, camera zooms in.

News

  • [2024/09/24] 🎉 We have released the first version of the model weights, available on Hugging Face. This model can be directly used for inference on the original Pandora project.
  • [2024/09/24] 🎉 The training code for the alignment and finetuning stages is available.
  • [2024/09/24] 🎉 Supports video output at 576×1024 resolution.

Setup

conda create -n pandora python=3.11.0
conda activate pandora
conda install pytorch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -U xformers==0.0.24+cu121 --index-url https://download.pytorch.org/whl/cu121
bash build_envs.sh  

If your GPU doesn't support CUDA 12.1, you can also install with CUDA 11.8:

conda create -n pandora python=3.11.0
conda activate pandora
conda install pytorch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -U xformers==0.0.24+cu118 --index-url https://download.pytorch.org/whl/cu118
bash build_envs.sh  

Inference

Gradio Demo

  1. Download the model checkpoint from Hugging Face.
  2. Run the commands on your terminal
CUDA_VISIBLE_DEVICES={cuda_id} python gradio_app.py  --ckpt_path {path_to_ckpt}

Then you can interact with the model through gradio interface.

Training Your Own Model

Before training the model, ensure that you have downloaded our model locally. Set $MODEL_DIR as the model path and $HOST_GPU_NUM as the number of GPUs. Run the following command to align the outputs of the Large Language Model (LLM) and the Text Encoder:

python3 -m torch.distributed.launch \
    --nproc_per_node=$HOST_GPU_NUM --nnodes=1 --master_addr=127.0.0.1 --master_port=10042 --node_rank=0 \
    trainer.py \
    --model_path $MODEL_DIR \
    --base config/config.yaml \
    --train \
    --do_alignment \
    --logdir output/ckp \
    --devices $HOST_GPU_NUM \
    lightning.trainer.num_nodes=1

Then, use the following command to finetune the model to obtain the final version:

python3 -m torch.distributed.launch \
    --nproc_per_node=$HOST_GPU_NUM --nnodes=1 --master_addr=127.0.0.1 --master_port=10042 --node_rank=0 \
    trainer.py \
    --model_path $MODEL_DIR \
    --base config/config.yaml \
    --train \
    --logdir output/ckp \
    --devices $HOST_GPU_NUM \
    lightning.trainer.num_nodes=1

The project is continuously improving, and we look forward to your contributions and participation.

References

Citation

If you find our work useful in your research, please cite us using the following BibTeX entry:

@misc{OpenPandora2024,
  author = {OpenSparseLLMs},
  title = {{Open-Pandora: An Open World Video Generation Model}},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/OpenSparseLLMs/Open-Pandora}},
}

About

Open-Pandora: On-the-fly Control Video Generation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published