Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train Code for VAE Used in Paper #19

Open
Ferry1231 opened this issue Aug 14, 2024 · 19 comments
Open

Train Code for VAE Used in Paper #19

Ferry1231 opened this issue Aug 14, 2024 · 19 comments

Comments

@Ferry1231
Copy link

Dear researcher,I have been reading your team's paper, and I found it incredibly insightful and was inspired to attempt a reproduction of the work. Considering the limited resources available in my lab and from a learning perspective, I plan to start by training the model on smaller datasets like CIFAR-10. However, I've encountered some difficulties while using the VAE encoder and couldn't find a VAE model that fits well with it.

Do you have the train code of VAE used in the paper? Another question, what does "vae_stride" param mean?

Thank you and thank for your works.

@LTH14
Copy link
Owner

LTH14 commented Aug 14, 2024

Thanks for your interest! We follow the VAE training from VQGAN and LDM. Please use this codebase and follow this config.

@LTH14
Copy link
Owner

LTH14 commented Aug 14, 2024

You need to copy the AutoencoderKL class to this file in the VQGAN codebase.

@Ferry1231
Copy link
Author

Thank you! I got it.

@gzhuinjune
Copy link

您好,请问我当时在跑rcg的时候训练过的vqgan也可以直接在这里用对吗,vae_ckpt就是vqgan对吗,请问还改了别的细节吗,我依旧想换平面图的数据集,谢谢您的耐心回答,希望您能把改动的细节都告诉我,我害怕对不上,之前的rgb的范围以及数据增强之类的应该如何设置呢,比起官方代码还有别的改动吗,谢谢!!!

@gzhuinjune
Copy link

很遗憾之前的rcg我一直没有跑出来理想的结果,这个看起来组成部分要少很多,谢谢您的帮助

@LTH14
Copy link
Owner

LTH14 commented Aug 27, 2024

@gzhuinjune A major difference here is the vae in this paper does not rely on the "quantization" step in vqgan. Of course, this framework can also use vq-based tokenizer, but a non-vq tokenizer should work better. You can start with the commonly used non-vq tokenizer like the one below:

from diffusers.models import AutoencoderKL
vae = AutoencoderKL.from_pretrained(f"stabilityai/sd-vae-ft-ema")

@LTH14
Copy link
Owner

LTH14 commented Aug 28, 2024

@gzhuinjune 不能用这个,因为这个是在imagenet上训练的。可以用我上面说的Stable Diffusion用的VAE,他们是在openimage上训练的,通用性好很多,可以先在你的数据集上试试reconstruction效果。当然,如果performance不好的话,那还是得先在你自己的数据集上训练一个vae

@gzhuinjune
Copy link

您在上面引用了vqgan.py,请问这个是在哪里用呀,应该不是把AutoencoderKL放这里面吧,抱歉我还是没有明白用vqgan里面的哪个代码来训练
image
。您提到的sd的vae是这个对吗。

@gzhuinjune
Copy link

vae = AutoencoderKL.from_pretrained(f"stabilityai/sd-vae-ft-ema")是写在哪个文件里面呀,是main.py里面对吗,然后复制一个类到vqgan里面,直接用vqgan官网的这个定制数据集的来训练对吗
image

@gzhuinjune
Copy link

image
请问openimage的预训练权重是这个吗

@gzhuinjune
Copy link

是将AutoencoderKL类复制到了vqgan,为啥这里写得是from diffusers.models import AutoencoderKL,谢谢

@LTH14
Copy link
Owner

LTH14 commented Aug 28, 2024

from diffusers.models import AutoencoderKL让你可以直接使用stable diffusion训练好的vae。但如果你需要自己训练,那就需要把AutoencoderKL(https://github.com/CompVis/latent-diffusion/blob/main/ldm/models/autoencoder.py#L285)复制到vqgan的codebase里进行训练。

@gzhuinjune
Copy link

您好,请问我把AutoencoderKL类复制到vqgan里面替换哪个类呢,只替换进去应该没有被调用吧,我还是接着用main.py训练自己数据集的那个脚本示例对吗,然后如何在main.py里面调用它呢,谢谢您对初学者的耐心解答

@LTH14
Copy link
Owner

LTH14 commented Aug 28, 2024

复制到taming/models/vqgan.py里,然后用这个config。需要把这个config里面的ldm路径改成vqgan里的路径

@gzhuinjune
Copy link

谢谢大帅哥,祝你一切顺利

@gzhuinjune
Copy link

我明白了

@fengyang0317
Copy link

Is there any reason for using taming-transformers instead of latent-diffusion or stable-diffusion codebase?

@LTH14
Copy link
Owner

LTH14 commented Oct 22, 2024

@fengyang0317 not really -- I just chose one.

@fengyang0317
Copy link

fengyang0317 commented Oct 23, 2024

I see. Thank you so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants