-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About the idea to further enhance the performance. #78
Comments
I also changed llm to qwen1.5, and the performance will be somewhat improved.
Feel that our directions are very similar, if interested, you can leave the contact information to communicate. |
Allava had a Chinese version. What do u mean by deepseek's hybrid? Minigemini already a hybrid arch. |
There is a Chinese version of Allava, but both the Chinese and English versions are dirty. In the Chinese version of allava, there are many phenomena of picture-text mismatch, translation dislocation and translation hallucination. For example, grep “宁静湖畔” in allava-cn , the result is a high probability of picture and text mismatch. Therefore, it is necessary to clean allava-en and allava-cn, and the addition of allava-cn can also bring about the improvement of indicators. Minigemini has a mixed structure, but after the experiment, deepseek-vl will be slightly better. Interlm-xcomposer data, I specifically refer to the sft phase, such as aokvqa, okvqa, lvis data |
How did u clean allava data and manually translate to Chinese version? Would share the data after shared? that would be very nice. Also, does internxcomposer opensourced their sft data? |
Hi, I have conducted experiment minigemini arch to Qwen series model, it has a good performance.
However, the performance didn't strong enough compare to some SOTA small models such as MiniCPMv2 LLavaUHD etc.
Which used a very large input and slicing technology.
As such, am just wonder, how can we further pushing the boundry of mini-gemini, and make mini-gemini great again?
The currently baseline I got from qwen7b is slightly same as gemma7b's on MMMU, but this is actually not very satisfying.
Here are some thoughts to further improve on my mind:
So here is I want talk about: How should we exactly make some improvement?
Hoping for your discussion and insights, guid me on the right path.
The text was updated successfully, but these errors were encountered: