Skip to content

Latest commit

 

History

History
101 lines (81 loc) · 2.89 KB

README.md

File metadata and controls

101 lines (81 loc) · 2.89 KB

Flatness-Aware Prompt Selection Improves Accuracy and Sample Efficiency

This is the official documentation for the paper Flatness-Aware Prompt Selection Improves Accuracy and Sample Efficiency.

Table of Contents

Installation

To run the codes, follow the steps below: Install the required dependencies as followings:

pip install -r requirements.txt

Obtain scores

Get the metrics scores for the prompts as follows:

CUDA_VISIBLE_DEVICES=0 python main.py \
--model="gpt2" \
--dataset=agnews \
--num_seeds=1 \
--all_shots = 4 \
--subsample_test_set=512 \
--approx
  • all_shots: Number of demonstrations
  • model: the selected model
  • dataset: dataset name
  • subsample_test_set: size of test set to use to speed up eval. None means using all test set

Result format

After running the codes above, you'll get results (pickle file). For each experiment, we store a result tree in the following format:

{
  seed_id: {
    id: {
      // prompt level info
      id: prompt_id,
      promt: prompt_text,
      sen: sen_score,
      mi: mi_score,
      perf: performance (acc),

    }
    // seed-level info: correlations across prompt
    sen_p:  ,
    sen_s: ,
    mi_p: ..,
    mi_s: ..,
  }
  // top level info like avg sensitivity avg accuracy etc. is calculated by print_results function. they are not stored in the pickle
}
  • id: the prompt id
  • promt: the contents of prompt
  • sen: the sensitivity of the prompt
  • mi: multual information of the prompt
  • perf: accuracy of the prompt
  • sen_p: Pearson correlation between performance and sensitivity
  • sen_s: Spearman correlation between performance and sensitivity
  • mi_p: Pearson correlation between performance and mutual information
  • mi_s: Spearman correlation between performance and mutual information

Tune Alpha

After obtaining the correlation between metrics scores and performance on the dev-set, we tune the alpha that maximizes the correlation or other metrics (e.g., NDCG). Then fix it, and run on the large test set.

Customization

To set your own custom prompts, you can change it at promptset in main.py

Contact Us

If you have any questions, suggestions, or concerns, please reach out to us.

Relevant paper

If you find this repository/data helpful, cite the following work:

@article{shen2023flatnessaware,
      title={Flatness-Aware Prompt Selection Improves Accuracy and Sample Efficiency}, 
      author={Lingfeng Shen and Weiting Tan and Boyuan Zheng and Daniel Khashabi},
      year={2023},
      eprint={2305.10713},
      archivePrefix={arXiv},
      primaryClass={cs.CL}, 
      url={https://arxiv.org/abs/2305.10713}
}