Update README.md

scicode-bench · Nov 5, 2024 · 7268a4a · 7268a4a
1 parent 7d77559
commit 7268a4a
Showing 1 changed file with 47 additions and 17 deletions.
diff --git a/eval/scripts/README.md b/eval/scripts/README.md
@@ -1,42 +1,72 @@
- ## **Generate LLM code**
-
-Your first need to set up your API keys. For this, create a `keys.cfg` file at the root of the repository
-and add keys as follows:
+## **Generating Code with LLMs**
+
+### 1. Set Up Your API Keys
+
+First, create a `keys.cfg` file at the root of the repository and add your API keys for the different providers as follows:
 
 ```
 OPENAI_KEY = 'your_api_key'
 ANTHROPIC_KEY = 'your_api_key'
-GOOGLE_KEY = 'your_api_key' 
+GOOGLE_KEY = 'your_api_key' 
+```
+
+If you're using **litellm**, which supports a variety of providers including **vllm**, **Hugging Face**, and **Together AI**, make sure to include the relevant API key in the `keys.cfg` file. Please refer to the docs [here](https://docs.litellm.ai/docs/providers). Then, use `litellm/*` as the model name when running the command.
+
+For example, to use **Together AI**'s models, you'll need to add the following to your `keys.cfg`:
+
+```
+TOGETHERAI_API_KEY = 'your_api_key'
 ```
 
-For example, to create model results with `gpt-4o` and the default settings, go to the root of this repo and run 
+### 2. Generating Code
+
+To generate code using the **Together AI** model (e.g., `Meta-Llama-3.1-70B-Instruct-Turbo`), go to the root of this repo and run:
+
+```bash
+python eval/scripts/gencode_json.py --model litellm/together_ai/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
+```
+
+To generate code using **GPT-4o** (with default settings), go to the root of this repo and run:
 
 ```bash
 python eval/scripts/gencode_json.py --model gpt-4o
 ```
 
-For results with scientist-annotated background, run
+If you want to include **scientist-annotated background** in the prompts, use the `--with-background` flag:
 
 ```bash
 python eval/scripts/gencode_json.py --model gpt-4o --with-background
 ```
 
+Please note that we do not plan to release the ground truth code for each problem to the public. However, we have made a dev set available that includes the ground truth code in `eval/data/problems_dev.jsonl`. 
+
+In this repository, **we only support evaluating with previously generated code for each step.**
 
 ### Command-Line Arguments
 
-- `--model` - Specifies the model name used for generating responses.
-- `--output-dir` - Directory to store the generated code outputs (Default: `eval_results/generated_code`).
-- `--input-path` - Directory containing the JSON files describing the problems (Default: `eval/data/problems_all.jsonl`).
-- `--prompt-dir` - Directory where prompt files are saved (Default: `eval_results/prompt`).
-- `--with-background` - Include problem background if enabled.
-- `--temperature` - Controls the randomness of the generation (Default: 0).
-
-## **Evaluate generated code**
+When running the `gencode_json.py` script, you can use the following options:
+
+- `--model`: Specifies the model name to be used for generating code (e.g., `gpt-4o` or `litellm/together_ai/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo`).
+- `--output-dir`: Directory where the generated code outputs will be saved. Default is `eval_results/generated_code`.
+- `--input-path`: Directory containing the JSON files describing the problems. Default is `eval/data/problems_all.jsonl`.
+- `--prompt-dir`: Directory where prompt files are saved. Default is `eval_results/prompt`.
+- `--with-background`: If enabled, includes the problem background in the generated code.
+- `--temperature`: Controls the randomness of the output. Default is 0.
+
+---
 
-Download the [numeric test results](https://drive.google.com/drive/folders/1W5GZW6_bdiDAiipuFMqdUhvUaHIj6-pR?usp=drive_link) and save them as `./eval/data/test_data.h5`
+## **Evaluating the Generated Code**
 
-To run the script, go to the root of this repo and use the following command:
+### 1. Download Numeric Test Data
+
+Download the [numeric test results](https://drive.google.com/drive/folders/1W5GZW6_bdiDAiipuFMqdUhvUaHIj6-pR?usp=drive_link) and save it as `eval/data/test_data.h5`.
+
+### 2. Run the Evaluation
+
+To evaluate the generated code using a specific model, go to the root of this repo and use the following command:
 
 ```bash
 python eval/scripts/test_generated_code.py --model "model_name"
 ```
+
+Replace `"model_name"` with the appropriate model name, and include `--with-background` if the code is generated with **scientist-annotated background**.