Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about change ViT to 378 input resolution, but got poor results. #46

Open
OpenJarvisAI opened this issue Apr 14, 2024 · 5 comments

Comments

@OpenJarvisAI
Copy link

Hi, am alreaady tried using vit336 and convnext + Qwen LLM, which is great, and really got a good performance.

But when I try using another CLIP vit model with input size is 378, rest things are same (include traning data) the result are extremly poor.

To precisely:

  1. the loss are lower, normally I got 0.9-1.0 , but using CLIP with input size 378, the loss can to 0.7-0.8, but the inference result are very poor;
  2. The CLIP model I used was Apple's DNFS_vit_G_378 model.
  3. I have changed the convnext input resuoltion accordingly.

Any reason for this? This is really weired, better and larger ViT got bad results.

@hhaAndroid
Copy link

same #43

@yanwei-li
Copy link
Member

Hi, thanks for your report. If there are no other bugs, I guess you can try the following steps to locate the problem:

  1. If the performance is quite low (with over 10% performance drop), there may be some bugs in the implementation.
  2. Only apply DNFS_vit_G_378 without patch info mining to see whether the performance is satisfactory.
  3. If previous models are all good, try to use larger ConvNext, like CLIP-convnext_xxlarge. Because better ViT requires stronger ConvNext to provide candidate key and value for reference.

@OpenJarvisAI
Copy link
Author

Hi, am start to doubt Appl'es VIT is right or not, seems they just randomly post wrong weights....

meanwhile, do u have any condicates to used Vit-H or VitBigG?

@yanwei-li
Copy link
Member

Hi, we donot plan to use larger ViT to retrain the model. Because it could exceed our current resources.

@OpenJarvisAI
Copy link
Author

@yanwei-li Hi, which kind of path are u guys currently work on to enhance even more better performance of MGM?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants