-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about change ViT to 378 input resolution, but got poor results. #46
Comments
same #43 |
Hi, thanks for your report. If there are no other bugs, I guess you can try the following steps to locate the problem:
|
Hi, am start to doubt Appl'es VIT is right or not, seems they just randomly post wrong weights.... meanwhile, do u have any condicates to used Vit-H or VitBigG? |
Hi, we donot plan to use larger ViT to retrain the model. Because it could exceed our current resources. |
@yanwei-li Hi, which kind of path are u guys currently work on to enhance even more better performance of MGM? |
Hi, am alreaady tried using vit336 and convnext + Qwen LLM, which is great, and really got a good performance.
But when I try using another CLIP vit model with input size is 378, rest things are same (include traning data) the result are extremly poor.
To precisely:
Any reason for this? This is really weired, better and larger ViT got bad results.
The text was updated successfully, but these errors were encountered: