Skip to content

Commit

Permalink
Deployed 50d3671 with MkDocs version: 1.6.0
Browse files Browse the repository at this point in the history
  • Loading branch information
github-actions[bot] committed Jul 15, 2024
1 parent f8a8c9e commit 57bc3e5
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 3 deletions.
4 changes: 2 additions & 2 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -1187,8 +1187,8 @@ <h2 id="benchmark-statistics">Benchmark Statistics</h2>
</tbody>
</table>
<p><a class="glightbox" href="figures/SciCode_chart.png" data-type="image" data-width="auto" data-height="auto" data-desc-position="bottom"><img alt="Image Title" src="figures/SciCode_chart.png" /></a></p>
<p style="text-align: center;">** Distribution of Main Problems **Right:** Distribution of Subproblems</p>
<p>**Left:</p>
<p style="text-align: center;">**Left:** Distribution of Main Problems **Right:** Distribution of Subproblems</p>

<h2 id="experiment-results">Experiment Results</h2>
<p>We evaluate our model using zero-shot prompts. We keep the prompts general and design different ones for different evaluation setups only to inform the model about the tasks. We keep prompts the same across models and fields, and they contain the model’s main and sub-problem instructions and code for previous subproblems. The standard setup means the model is tested without background knowledge and carrying over generated solutions to previous subproblems. The scientists' annotated background provides the necessary knowledge and reasoning steps to solve the problems, shifting the evaluation’s focus more towards the models’ coding and instruction-following capabilities.
<a class="glightbox" href="figures/Standard_Setup.png" data-type="image" data-width="auto" data-height="auto" data-desc-position="bottom"><img alt="Image Title" src="figures/Standard_Setup.png" /></a>
Expand Down
Loading

0 comments on commit 57bc3e5

Please sign in to comment.