You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm currently following your excellent work MAR. I would like to know the impact of the VAE feature dimensions on model performance. I saw that you experimented with 16 and 8 dimensions features of VAE in the paper. Have you tried using 32 dimensions or larger dimensions? @LTH14
The text was updated successfully, but these errors were encountered:
Thanks for your interest! Note that here KL-16 and KL-8 denote the downsampling stride of the tokenizer (KL-16 downsamples 256x256x3 image into 16x16x16 tokens, and KL-8 downsamples it into 32x32x4 tokens).
We don't have an ablation on this feature dimension in the paper. A higher VAE dimension typically improves reconstruction performance. However, we also found that the higher the VAE feature dimension, the harder it is for the simple DiffLoss to model it, so it is a trade-off.
I'm currently following your excellent work MAR. I would like to know the impact of the VAE feature dimensions on model performance. I saw that you experimented with 16 and 8 dimensions features of VAE in the paper. Have you tried using 32 dimensions or larger dimensions? @LTH14
The text was updated successfully, but these errors were encountered: