Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The influence of VAE feature dim #53

Open
Tom-zgt opened this issue Sep 27, 2024 · 1 comment
Open

The influence of VAE feature dim #53

Tom-zgt opened this issue Sep 27, 2024 · 1 comment

Comments

@Tom-zgt
Copy link

Tom-zgt commented Sep 27, 2024

I'm currently following your excellent work MAR. I would like to know the impact of the VAE feature dimensions on model performance. I saw that you experimented with 16 and 8 dimensions features of VAE in the paper. Have you tried using 32 dimensions or larger dimensions? @LTH14
屏幕快照 2024-09-27 下午1 43 40

@LTH14
Copy link
Owner

LTH14 commented Sep 27, 2024

Thanks for your interest! Note that here KL-16 and KL-8 denote the downsampling stride of the tokenizer (KL-16 downsamples 256x256x3 image into 16x16x16 tokens, and KL-8 downsamples it into 32x32x4 tokens).

We don't have an ablation on this feature dimension in the paper. A higher VAE dimension typically improves reconstruction performance. However, we also found that the higher the VAE feature dimension, the harder it is for the simple DiffLoss to model it, so it is a trade-off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants