Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why put a VAE to encode and decode outside the MAR module? #58

Open
DeepDuke opened this issue Oct 11, 2024 · 6 comments
Open

Why put a VAE to encode and decode outside the MAR module? #58

DeepDuke opened this issue Oct 11, 2024 · 6 comments

Comments

@DeepDuke
Copy link

DeepDuke commented Oct 11, 2024

Sorry, I'm a newbie to this field. I was confused about since MAR already has a encoder-decoder module why use a VAE encoder to encode the input images into a gaussian noise image, then after the MAR module, use the VAE decoder to decode the MAR's output? Thanks for you kind explaination.

@LTH14
Copy link
Owner

LTH14 commented Oct 11, 2024

Using VAE (or in general, AE) to encode the image is a standard practice for current image generative models. See these two papers: https://arxiv.org/abs/2012.09841. https://arxiv.org/pdf/2112.10752

@DeepDuke
Copy link
Author

DeepDuke commented Oct 14, 2024

Hi, I have a new question: Why self.encoder_pos_embed_learned and self.decoder_pos_embed_learned are set as learnable parameters, and they seem to have no direct association with patch position value? @LTH14

@LTH14
Copy link
Owner

LTH14 commented Oct 14, 2024

Using learnable parameters is one way to do positional embedding. We need these two parameters to indicate the patch location for each patch.

@DeepDuke
Copy link
Author

@LTH14 Yeah. I know you are using them to do positional embedding. But the positional embedding seems to always be the same for every patch in single image? Because there is no patch position-related operation on self.encoder_pos_embed_learned and decoder_pos_embed_learned

@LTH14
Copy link
Owner

LTH14 commented Oct 14, 2024

There are seq_len+buffer_size different position embeddings in them.

@DeepDuke
Copy link
Author

There are seq_len+buffer_size different position embeddings in them.

Oh, I see. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants