We provide the training data for PA-RAG, available at
https://drive.google.com/file/d/1agP7fi1iX-3qFK7XFBvRu6rC5X_-M8Iy/view?usp=drive_link
Include
sft_data.json
: 58.9k instruction fine-tuning datadpo_data_ri.json
: 11.8k response informative preference datadpo_data_rr.json
: 13.4k response robustness preference datadpo_data_cq.json
: 22.5k citation quality preference data
The question used for constracting our training data are sourced from ASQA, WebQuestions, and Natural Questions. Detailed statistics are as follows:
IFT | RI | RR | CQ | |
---|---|---|---|---|
ASQA | 1,714 | 1,046 | 962 | 631 |
WebQ | 1,681 | 326 | 357 | 653 |
NQ | 55,463 | 10,416 | 12,080 | 21,241 |
Sum | 58,858 | 11,788 | 13,399 | 22,525 |
The data for evaluation is available at
https://drive.google.com/file/d/1vn5O_PtUnV3rOC7CAbSsZITG6NQ1EZtx/view?usp=drive_link.
The qustions are sourced from the test split from ASQA, WebQustions, Natural Questions, and TriviaQA. The retrieved documents are retrieved by dense retriever GTR from Wikipedia dump from December 20, 2018.
We use the framework LLaMA-Factory to train our models. We selected three general LLMs as the base RAG generator: Llama2-7b-chat, Llama2-13b-chat, Llama3-8b-instruct.
We utilized full fine-tuning for all training stages and employed the same hyperparameter settings for all models.
During the instruction fine-tuning phase, we set the batch size to 128, the learning rate to 2e-5, and trained for one epoch.
In the preference optimization phase, we set the batch size to 64 and trained for one epoch for all stages. For the optimization stages of response informativeness and response robustness, the learning rate is 2e-6. In the citation quality optimization stage, the learning rate is 2e-7.
Inference with zero-shot setting
CUDA_VISIBLE_DEVICES=0 python inference/inference_vllm.py \
--model model_path \
--prompt_file prompts/default.json \
--eval_file data_path (e.g.data/asqa_dev.json) \
--output_file output_path \
--shot 0 \
--ndoc 5 \
Download the NLI model TRUE before evaluate.
CUDA_VISIBLE_DEVICES=0 python inference/eval.py --f response_to_eval_path --no_rouge --citation