Skip to content

Commit

Permalink
Merge branch 'github-page' of https://github.com/X-LANCE/MULTI-Benchmark
Browse files Browse the repository at this point in the history
 into github-page
  • Loading branch information
JamesZhutheThird committed Dec 7, 2023
2 parents 8a206c2 + 74069bf commit 84f68e2
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 1 deletion.
2 changes: 1 addition & 1 deletion index.html
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ <h1 class="title is-1 publication-title">MULTI: Multimodal Understanding Leaderb
<!-- </div>-->
<!-- </div>-->
<h3 class="subtitle is-size-3-tablet has-text-left pb-">
<p style="text-align:justify; line-height:150%; margin-left: -130px; margin-right: -130px; font-size: 20px">We introduce <b>MULTI</b>: a multi-level, multi-disciplinary, and multi-type cross-modal test benchmark, aimed at evaluating the performance of multimodal generative large models under different conditions and scenarios. We collected and annotated more than 18K questions from exams, quizzes, textbooks, websites and other resources, most of which underwent at least two rounds of human annotation and checking, and three rounds of script cleaning. Some questions were manually adapted to make them more suitable for evaluating the comprehensive ability of the model. These questions involve four educational levels: junior high school, high school, college and social exams, covering Chinese, mathematics, English, physics, chemistry, biology, history, geography, politics, information technology, driving test and other disciplines and fields, including single choice, multiple choice, fill in the blank (given range and fully open), and open-ended discussions.
<p style="text-align:justify; line-height:150%; margin-left: -75px; margin-right: -75px; font-size: 20px">We introduce <b>MULTI</b>: a multi-level, multi-disciplinary, and multi-type cross-modal test benchmark, aimed at evaluating the performance of multimodal generative large models under different conditions and scenarios. We collected and annotated more than 18K questions from exams, quizzes, textbooks, websites and other resources, most of which underwent at least two rounds of human annotation and checking, and three rounds of script cleaning. Some questions were manually adapted to make them more suitable for evaluating the comprehensive ability of the model. These questions involve four educational levels: junior high school, high school, college and social exams, covering Chinese, mathematics, English, physics, chemistry, biology, history, geography, politics, information technology, driving test and other disciplines and fields, including single choice, multiple choice, fill in the blank (given range and fully open), and open-ended discussions.
<br><br>We manually selected 500 questions to form a difficult subset, which is used to evaluate the model's extreme performance. These questions often contain multiple images and formulas, test the model's comprehensive understanding of multiple images, and require complex and rigorous logical reasoning. The performance of this part of the data will be displayed separately on the leaderboard.
<br><br>We tested on GPT-3.5 and open-source multimodal large models<sup>*</sup>, and the results show that even the advanced GPT-3.5 only achieved 43.28% accuracy, showing a huge room for improvement. We believe that MULTI will motivate the community to build the next generation of multimodal foundation models, to achieve expert-level artificial general intelligence.
<br><br> <p style="font-size:15px"><sup>*</sup> Based on v0.3.0-20231115 version of the data, tested on SC/MC/FIB three question types.</p>
Expand Down
6 changes: 6 additions & 0 deletions static/css/index.css
Original file line number Diff line number Diff line change
Expand Up @@ -49,16 +49,22 @@ body {

.publication-title {
font-family: 'Google Sans', sans-serif;
padding-left: -50px;
padding-right: -50px;
}

.publication-authors {
font-family: 'Google Sans', sans-serif;
padding-left: -50px;
padding-right: -50px;
}

.publication-venue {
color: #555;
width: fit-content;
font-weight: bold;
padding-left: -10px;
padding-right: -10px;
}

.publication-awards {
Expand Down

0 comments on commit 84f68e2

Please sign in to comment.