-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vqgan training #52
base: main
Are you sure you want to change the base?
Vqgan training #52
Conversation
Tested out on random noise and it runs. I'll try adapting to webdataset on some clusters and see how it does! |
I found https://arxiv.org/abs/2212.03185 thanks to Laion(Ryu) which improves on movq.
|
I'm starting to add the projected gan technique from here. This seems to still have state-of-the-art in quite a few datasets although it is from 2021. The main idea is instead of plugging in images to the generator/discriminator, plugging in timm computed hierarchical features which makes training converge faster. |
Other news is I was finally able to add the imagenet training dataset to the cluster so I will be testing the movq/spectral norm added f16 pre-trained model soon |
I'll add Finite Scalar Quantization: VQ-VAE Made Simple since that seems very interesting. It seems to lead to Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation which has a better fid than diffusion models seems like |
This is a draft pr for adding the vqgan training. It's still quite rough around the edges but might be able to do ok after some bug fixes.