-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
47 additions
and
17 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,42 +1,72 @@ | ||
## **Generate LLM code** | ||
|
||
Your first need to set up your API keys. For this, create a `keys.cfg` file at the root of the repository | ||
and add keys as follows: | ||
## **Generating Code with LLMs** | ||
|
||
### 1. Set Up Your API Keys | ||
|
||
First, create a `keys.cfg` file at the root of the repository and add your API keys for the different providers as follows: | ||
|
||
``` | ||
OPENAI_KEY = 'your_api_key' | ||
ANTHROPIC_KEY = 'your_api_key' | ||
GOOGLE_KEY = 'your_api_key' | ||
GOOGLE_KEY = 'your_api_key' | ||
``` | ||
|
||
If you're using **litellm**, which supports a variety of providers including **vllm**, **Hugging Face**, and **Together AI**, make sure to include the relevant API key in the `keys.cfg` file. Please refer to the docs [here](https://docs.litellm.ai/docs/providers). Then, use `litellm/*` as the model name when running the command. | ||
|
||
For example, to use **Together AI**'s models, you'll need to add the following to your `keys.cfg`: | ||
|
||
``` | ||
TOGETHERAI_API_KEY = 'your_api_key' | ||
``` | ||
|
||
For example, to create model results with `gpt-4o` and the default settings, go to the root of this repo and run | ||
### 2. Generating Code | ||
|
||
To generate code using the **Together AI** model (e.g., `Meta-Llama-3.1-70B-Instruct-Turbo`), go to the root of this repo and run: | ||
|
||
```bash | ||
python eval/scripts/gencode_json.py --model litellm/together_ai/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo | ||
``` | ||
|
||
To generate code using **GPT-4o** (with default settings), go to the root of this repo and run: | ||
|
||
```bash | ||
python eval/scripts/gencode_json.py --model gpt-4o | ||
``` | ||
|
||
For results with scientist-annotated background, run | ||
If you want to include **scientist-annotated background** in the prompts, use the `--with-background` flag: | ||
|
||
```bash | ||
python eval/scripts/gencode_json.py --model gpt-4o --with-background | ||
``` | ||
|
||
Please note that we do not plan to release the ground truth code for each problem to the public. However, we have made a dev set available that includes the ground truth code in `eval/data/problems_dev.jsonl`. | ||
|
||
In this repository, **we only support evaluating with previously generated code for each step.** | ||
|
||
### Command-Line Arguments | ||
|
||
- `--model` - Specifies the model name used for generating responses. | ||
- `--output-dir` - Directory to store the generated code outputs (Default: `eval_results/generated_code`). | ||
- `--input-path` - Directory containing the JSON files describing the problems (Default: `eval/data/problems_all.jsonl`). | ||
- `--prompt-dir` - Directory where prompt files are saved (Default: `eval_results/prompt`). | ||
- `--with-background` - Include problem background if enabled. | ||
- `--temperature` - Controls the randomness of the generation (Default: 0). | ||
|
||
## **Evaluate generated code** | ||
When running the `gencode_json.py` script, you can use the following options: | ||
|
||
- `--model`: Specifies the model name to be used for generating code (e.g., `gpt-4o` or `litellm/together_ai/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo`). | ||
- `--output-dir`: Directory where the generated code outputs will be saved. Default is `eval_results/generated_code`. | ||
- `--input-path`: Directory containing the JSON files describing the problems. Default is `eval/data/problems_all.jsonl`. | ||
- `--prompt-dir`: Directory where prompt files are saved. Default is `eval_results/prompt`. | ||
- `--with-background`: If enabled, includes the problem background in the generated code. | ||
- `--temperature`: Controls the randomness of the output. Default is 0. | ||
|
||
--- | ||
|
||
Download the [numeric test results](https://drive.google.com/drive/folders/1W5GZW6_bdiDAiipuFMqdUhvUaHIj6-pR?usp=drive_link) and save them as `./eval/data/test_data.h5` | ||
## **Evaluating the Generated Code** | ||
|
||
To run the script, go to the root of this repo and use the following command: | ||
### 1. Download Numeric Test Data | ||
|
||
Download the [numeric test results](https://drive.google.com/drive/folders/1W5GZW6_bdiDAiipuFMqdUhvUaHIj6-pR?usp=drive_link) and save it as `eval/data/test_data.h5`. | ||
|
||
### 2. Run the Evaluation | ||
|
||
To evaluate the generated code using a specific model, go to the root of this repo and use the following command: | ||
|
||
```bash | ||
python eval/scripts/test_generated_code.py --model "model_name" | ||
``` | ||
|
||
Replace `"model_name"` with the appropriate model name, and include `--with-background` if the code is generated with **scientist-annotated background**. |