From 7d77559c4e8c2715f7919e6cdf0cdf70681c17d0 Mon Sep 17 00:00:00 2001 From: Minyang Tian <69544994+mtian8@users.noreply.github.com> Date: Mon, 4 Nov 2024 10:18:16 -0600 Subject: [PATCH] Update README.md --- README.md | 35 ++++++++++++++++++++--------------- 1 file changed, 20 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index 8978716..14feccd 100644 --- a/README.md +++ b/README.md @@ -7,6 +7,8 @@ This repo contains the evaluation code for the paper "[SciCode: A Research Codin ## πNews +**[2024-11-04]: Leaderboard is on! Check [here](https://scicode-bench.github.io/leaderboard/). We have also added Claude Sonnet 3.5 (new) results.** + **[2024-10-01]: We have added OpenAI o1-mini and o1-preview results.** **[2024-09-26]: SciCode is accepted at NeurIPS D&B Track 2024.** @@ -23,21 +25,24 @@ SciCode sources challenging and realistic research-level coding problems across ## π Leaderboard -| Model | Subproblem | Main Problem | -|---------------------------|------------|--------------| -| **OpenAI o1-preview** | **28.5** | **7.7** | -| Claude3.5-Sonnet | 26 | 4.6 | -| GPT-4o | 25 | 1.5 | -| GPT-4-Turbo | 22.9 | 1.5 | -| OpenAI o1-mini | 22.2 | 1.5 | -| Gemini 1.5 Pro | 21.9 | 1.5 | -| Claude3-Opus | 21.5 | 1.5 | -| Deepseek-Coder-v2 | 21.2 | 3.1 | -| Claude3-Sonnet | 17 | 1.5 | -| Qwen2-72B-Instruct | 17 | 1.5 | -| Llama-3.1-70B-Instruct | 16.3 | 1.5 | -| Mixtral-8x22B-Instruct | 16.3 | 0 | -| Llama-3-70B-Chat | 14.6 | 0 | +| Models | Main Problem Resolve Rate | Subproblem | +|--------------------------|-------------------------------------|-------------------------------------| +| π₯ OpenAI o1-preview |