SciCode/eval/scripts at main · scicode-bench/SciCode

Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
gencode_json.py	gencode_json.py
test_generated_code.py	test_generated_code.py

Generating Code with LLMs

1. Set Up Your API Keys

First, create a keys.cfg file at the root of the repository and add your API keys for the different providers as follows:

OPENAI_KEY = 'your_api_key'
ANTHROPIC_KEY = 'your_api_key'
GOOGLE_KEY = 'your_api_key'

If you're using litellm, which supports a variety of providers including vllm, Hugging Face, and Together AI, make sure to include the relevant API key in the keys.cfg file. Please refer to the docs here. Then, use litellm/* as the model name when running the command.

For example, to use Together AI's models, you'll need to add the following to your keys.cfg:

TOGETHERAI_API_KEY = 'your_api_key'

2. Generating Code

To generate code using the Together AI model (e.g., Meta-Llama-3.1-70B-Instruct-Turbo), go to the root of this repo and run:

python eval/scripts/gencode_json.py --model litellm/together_ai/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo

To generate code using GPT-4o (with default settings), go to the root of this repo and run:

python eval/scripts/gencode_json.py --model gpt-4o

If you want to include scientist-annotated background in the prompts, use the --with-background flag:

python eval/scripts/gencode_json.py --model gpt-4o --with-background

Please note that we do not plan to release the ground truth code for each problem to the public. However, we have made a dev set available that includes the ground truth code in eval/data/problems_dev.jsonl.

In this repository, we only support evaluating with previously generated code for each step.

Command-Line Arguments

When running the gencode_json.py script, you can use the following options:

--model: Specifies the model name to be used for generating code (e.g., gpt-4o or litellm/together_ai/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo).
--output-dir: Directory where the generated code outputs will be saved. Default is eval_results/generated_code.
--input-path: Directory containing the JSON files describing the problems. Default is eval/data/problems_all.jsonl.
--prompt-dir: Directory where prompt files are saved. Default is eval_results/prompt.
--with-background: If enabled, includes the problem background in the generated code.
--temperature: Controls the randomness of the output. Default is 0.

Evaluating the Generated Code

1. Download Numeric Test Data

Download the numeric test results and save it as eval/data/test_data.h5.

2. Run the Evaluation

To evaluate the generated code using a specific model, go to the root of this repo and use the following command:

python eval/scripts/test_generated_code.py --model "model_name"

Replace "model_name" with the appropriate model name, and include --with-background if the code is generated with scientist-annotated background.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts

scripts

README.md

Generating Code with LLMs

1. Set Up Your API Keys

2. Generating Code

Command-Line Arguments

Evaluating the Generated Code

1. Download Numeric Test Data

2. Run the Evaluation

Files

scripts

Directory actions

More options

Directory actions

More options

Latest commit

History

scripts

Folders and files

parent directory

README.md

Generating Code with LLMs

1. Set Up Your API Keys

2. Generating Code

Command-Line Arguments

Evaluating the Generated Code

1. Download Numeric Test Data

2. Run the Evaluation