- [2024.11.10] Release training and evaluation codes, models, and datasets for SEALONG.
Basic Dependencies:
- Python >= 3.10
- Pytorch >= 2.4.0
- CUDA Version >= 12.1
Install required packages:
git clone https://github.com/SihengLi99/SEALONG
pip install -r requirements.txt
Model Usage:
import transformers
import torch
model_id = "Siheng99/Llama-3.1-8B-Instruct-SEALONG"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
messages = [
{"role": "user", "content": "Who are you?"},
]
outputs = pipeline(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
Data Usage:
from datasets import load_dataset
dataset = load_dataset("Siheng99/Llama-3.1-8B-Instruct-SEALONG-Dataset")
print(dataset)
print(dataset["train"][0])
bash scripts/eval_longbench_qa.sh
Note: Set MODEL_NAME_OR_PATH to the desired target model.
Download MuSiQue:
cd data
gdown 'https://drive.google.com/uc?export=download&id=1tGdADlNjWFaHLeZZGShh2IRcpO6Lv24h'
unzip musique_data_v1.0.zip -d musique && mv musique/data/* musique/
rm -r musique/data && rm musique_data_v1.0.zip
Process MuSiQue:
bash scripts/process_data.sh
Synthesize Training Data:
bash scripts/synthesize.sh
from datasets import load_dataset
dataset = load_dataset("Siheng99/Llama-3.1-8B-Instruct-SEALONG-Dataset")
dataset.save_to_disk(/path/to/your/save_dir)
Set MODEL_NAME_OR_PATH and DATASET in the scripts before fine-tuning.
ORPO:
# QLoRA
bash scripts/finetune_orpo_qlora_xtuner.sh
# Full-parameter
bash scripts/finetune_orpo_xtuner.sh
SFT:
You may also opt for SFT; however, our findings indicate that ORPO achieves superior performance (see Table 5 in our paper).
# QLoRA
bash scripts/finetune_sft_qlora_xtuner.sh
# Full-parameter
bash scripts/finetune_sft_xtuner.sh
In our experiments, we select QLoRA for memory efficiency, but we also test full parameter training. We observe that a learning rate of 5e-6 yields decent performance when using ORPO with full parameter training.
If SEALONG is useful for your research or applications, please cite it with the following BibTeX:
@article{li2024large,
title={Large Language Models Can Self-Improve in Long-context Reasoning},
author={Li, Siheng and Yang, Cheng and Cheng, Zesen and Liu, Lemao and Yu, Mo and Yang, Yujiu and Lam, Wai},
journal={arXiv preprint arXiv:2411.08147},
year={2024}
}
We gratefully acknowledge the following projects that SEALONG builds upon: