GitHub - OpenSparseLLMs/Open-Pandora: Open-Pandora: On-the-fly Control Video Generation

Open-Pandora: An Open World video generation model

Based on the maitrix-org/Pandora project on GitHub, we have open-sourced the training code and models for the Pandora project. The training process includes two main stages: alignment and finetuning. Additionally, we have released the latest Pandora model weights, which were trained for 60w steps on the Webvid dataset.

Demo

You can control the model in real-time using text, currently supporting 5 rounds of autoregressive prediction to generate 10-second videos. Alternatively, you can generate a single video with the following effects:

Results with a resolution of 320×512.

2s 320×512	2s 320×512	2s 320×512

Wind flows the leaves.	The red car moves along the path.	Green hills of tuscany, italy, time-lapse.

The car moves forward.	A bonfire is lit in the middle of a field.	Pouring honey onto some slices of bread.

Results with a resolution of 576×1024.

2s 576×1024	2s 576×1024

A sailboat sailing in rough seas with a dramatic sunset	Two young women studying in a library.

A brown and white cow eating hay.	A bald eagle flying over a tree filled forest.

Two eggs are fried in a frying pan on the stove.	A boat sits on the shore of a lake with mt fuji in the background, camera zooms in.

News

[2024/09/24] 🎉 We have released the first version of the model weights, available on Hugging Face. This model can be directly used for inference on the original Pandora project.
[2024/09/24] 🎉 The training code for the alignment and finetuning stages is available.
[2024/09/24] 🎉 Supports video output at 576×1024 resolution.

Setup

conda create -n pandora python=3.11.0
conda activate pandora
conda install pytorch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -U xformers==0.0.24+cu121 --index-url https://download.pytorch.org/whl/cu121
bash build_envs.sh

If your GPU doesn't support CUDA 12.1, you can also install with CUDA 11.8:

conda create -n pandora python=3.11.0
conda activate pandora
conda install pytorch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -U xformers==0.0.24+cu118 --index-url https://download.pytorch.org/whl/cu118
bash build_envs.sh

Inference

Gradio Demo

Download the model checkpoint from Hugging Face.
Run the commands on your terminal

CUDA_VISIBLE_DEVICES={cuda_id} python gradio_app.py  --ckpt_path {path_to_ckpt}

Then you can interact with the model through gradio interface.

Training Your Own Model

Before training the model, ensure that you have downloaded our model locally. Set $MODEL_DIR as the model path and $HOST_GPU_NUM as the number of GPUs. Run the following command to align the outputs of the Large Language Model (LLM) and the Text Encoder:

python3 -m torch.distributed.launch \
    --nproc_per_node=$HOST_GPU_NUM --nnodes=1 --master_addr=127.0.0.1 --master_port=10042 --node_rank=0 \
    trainer.py \
    --model_path $MODEL_DIR \
    --base config/config.yaml \
    --train \
    --do_alignment \
    --logdir output/ckp \
    --devices $HOST_GPU_NUM \
    lightning.trainer.num_nodes=1

Then, use the following command to finetune the model to obtain the final version:

python3 -m torch.distributed.launch \
    --nproc_per_node=$HOST_GPU_NUM --nnodes=1 --master_addr=127.0.0.1 --master_port=10042 --node_rank=0 \
    trainer.py \
    --model_path $MODEL_DIR \
    --base config/config.yaml \
    --train \
    --logdir output/ckp \
    --devices $HOST_GPU_NUM \
    lightning.trainer.num_nodes=1

The project is continuously improving, and we look forward to your contributions and participation.

References

Repositories: maitrix-org/Pandora
Related Article: Pandora: Towards General World Model with Natural Language Actions and Video States

Citation

If you find our work useful in your research, please cite us using the following BibTeX entry:

@misc{OpenPandora2024,
  author = {OpenSparseLLMs},
  title = {{Open-Pandora: An Open World Video Generation Model}},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/OpenSparseLLMs/Open-Pandora}},
}

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
ChatUniVi		ChatUniVi
DynamiCrafter		DynamiCrafter
assets		assets
config		config
data		data
examples		examples
tools		tools
utils		utils
.gitignore		.gitignore
README.md		README.md
build_envs.sh		build_envs.sh
config_json.json		config_json.json
configuration.py		configuration.py
gradio_app.py		gradio_app.py
model.py		model.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open-Pandora: An Open World video generation model

Demo

Results with a resolution of 320×512.

Results with a resolution of 576×1024.

News

Setup

Inference

Gradio Demo

Training Your Own Model

References

Citation

About

Releases

Packages

Contributors 8

Languages

OpenSparseLLMs/Open-Pandora

Folders and files

Latest commit

History

Repository files navigation

Open-Pandora: An Open World video generation model

Demo

Results with a resolution of 320×512.

Results with a resolution of 576×1024.

News

Setup

Inference

Gradio Demo

Training Your Own Model

References

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Languages

Packages