Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
YanxinLu authored Nov 5, 2024
1 parent 7d77559 commit 7268a4a
Showing 1 changed file with 47 additions and 17 deletions.
64 changes: 47 additions & 17 deletions eval/scripts/README.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,72 @@
## **Generate LLM code**

Your first need to set up your API keys. For this, create a `keys.cfg` file at the root of the repository
and add keys as follows:
## **Generating Code with LLMs**

### 1. Set Up Your API Keys

First, create a `keys.cfg` file at the root of the repository and add your API keys for the different providers as follows:

```
OPENAI_KEY = 'your_api_key'
ANTHROPIC_KEY = 'your_api_key'
GOOGLE_KEY = 'your_api_key' 
GOOGLE_KEY = 'your_api_key'
```

If you're using **litellm**, which supports a variety of providers including **vllm**, **Hugging Face**, and **Together AI**, make sure to include the relevant API key in the `keys.cfg` file. Please refer to the docs [here](https://docs.litellm.ai/docs/providers). Then, use `litellm/*` as the model name when running the command.

For example, to use **Together AI**'s models, you'll need to add the following to your `keys.cfg`:

```
TOGETHERAI_API_KEY = 'your_api_key'
```

For example, to create model results with `gpt-4o` and the default settings, go to the root of this repo and run
### 2. Generating Code

To generate code using the **Together AI** model (e.g., `Meta-Llama-3.1-70B-Instruct-Turbo`), go to the root of this repo and run:

```bash
python eval/scripts/gencode_json.py --model litellm/together_ai/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
```

To generate code using **GPT-4o** (with default settings), go to the root of this repo and run:

```bash
python eval/scripts/gencode_json.py --model gpt-4o
```

For results with scientist-annotated background, run
If you want to include **scientist-annotated background** in the prompts, use the `--with-background` flag:

```bash
python eval/scripts/gencode_json.py --model gpt-4o --with-background
```

Please note that we do not plan to release the ground truth code for each problem to the public. However, we have made a dev set available that includes the ground truth code in `eval/data/problems_dev.jsonl`.

In this repository, **we only support evaluating with previously generated code for each step.**

### Command-Line Arguments

- `--model` - Specifies the model name used for generating responses.
- `--output-dir` - Directory to store the generated code outputs (Default: `eval_results/generated_code`).
- `--input-path` - Directory containing the JSON files describing the problems (Default: `eval/data/problems_all.jsonl`).
- `--prompt-dir` - Directory where prompt files are saved (Default: `eval_results/prompt`).
- `--with-background` - Include problem background if enabled.
- `--temperature` - Controls the randomness of the generation (Default: 0).

## **Evaluate generated code**
When running the `gencode_json.py` script, you can use the following options:

- `--model`: Specifies the model name to be used for generating code (e.g., `gpt-4o` or `litellm/together_ai/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo`).
- `--output-dir`: Directory where the generated code outputs will be saved. Default is `eval_results/generated_code`.
- `--input-path`: Directory containing the JSON files describing the problems. Default is `eval/data/problems_all.jsonl`.
- `--prompt-dir`: Directory where prompt files are saved. Default is `eval_results/prompt`.
- `--with-background`: If enabled, includes the problem background in the generated code.
- `--temperature`: Controls the randomness of the output. Default is 0.

---

Download the [numeric test results](https://drive.google.com/drive/folders/1W5GZW6_bdiDAiipuFMqdUhvUaHIj6-pR?usp=drive_link) and save them as `./eval/data/test_data.h5`
## **Evaluating the Generated Code**

To run the script, go to the root of this repo and use the following command:
### 1. Download Numeric Test Data

Download the [numeric test results](https://drive.google.com/drive/folders/1W5GZW6_bdiDAiipuFMqdUhvUaHIj6-pR?usp=drive_link) and save it as `eval/data/test_data.h5`.

### 2. Run the Evaluation

To evaluate the generated code using a specific model, go to the root of this repo and use the following command:

```bash
python eval/scripts/test_generated_code.py --model "model_name"
```

Replace `"model_name"` with the appropriate model name, and include `--with-background` if the code is generated with **scientist-annotated background**.

0 comments on commit 7268a4a

Please sign in to comment.