-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why put a VAE to encode and decode outside the MAR module? #58
Comments
Using VAE (or in general, AE) to encode the image is a standard practice for current image generative models. See these two papers: https://arxiv.org/abs/2012.09841. https://arxiv.org/pdf/2112.10752 |
Hi, I have a new question: Why |
Using learnable parameters is one way to do positional embedding. We need these two parameters to indicate the patch location for each patch. |
@LTH14 Yeah. I know you are using them to do positional embedding. But the positional embedding seems to always be the same for every patch in single image? Because there is no patch position-related operation on |
There are seq_len+buffer_size different position embeddings in them. |
Oh, I see. Thanks! |
Sorry, I'm a newbie to this field. I was confused about since MAR already has a encoder-decoder module why use a VAE encoder to encode the input images into a gaussian noise image, then after the MAR module, use the VAE decoder to decode the MAR's output? Thanks for you kind explaination.
The text was updated successfully, but these errors were encountered: