Skip to content

Large Language Models Can Self-Improve in Long-context Reasoning

Notifications You must be signed in to change notification settings

SihengLi99/SEALONG

Repository files navigation

🌸 "Just when the caterpillar thought the world was over, it became a butterfly." 🦋

arXiv hf_paper hf_model_data License: MIT

📰 News

  • [2024.11.10] Release training and evaluation codes, models, and datasets for SEALONG.

🛠️ Requirements and Installation

Basic Dependencies:

  • Python >= 3.10
  • Pytorch >= 2.4.0
  • CUDA Version >= 12.1

Install required packages:

git clone https://github.com/SihengLi99/SEALONG
pip install -r requirements.txt

🔑 Usage

Model Usage:

import transformers
import torch

model_id = "Siheng99/Llama-3.1-8B-Instruct-SEALONG"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Who are you?"},
]

outputs = pipeline(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Data Usage:

from datasets import load_dataset
dataset = load_dataset("Siheng99/Llama-3.1-8B-Instruct-SEALONG-Dataset")
print(dataset)
print(dataset["train"][0])

📊 Evaluation

bash scripts/eval_longbench_qa.sh

Note: Set MODEL_NAME_OR_PATH to the desired target model.

🔥 Training

Data Preparation

1. Synthesizing Your Own Data

Download MuSiQue:

cd data
gdown 'https://drive.google.com/uc?export=download&id=1tGdADlNjWFaHLeZZGShh2IRcpO6Lv24h'
unzip musique_data_v1.0.zip -d musique && mv musique/data/* musique/ 
rm -r musique/data && rm musique_data_v1.0.zip

Process MuSiQue:

bash scripts/process_data.sh

Synthesize Training Data:

bash scripts/synthesize.sh

2. Using Our Pre-synthesized Data

from datasets import load_dataset
dataset = load_dataset("Siheng99/Llama-3.1-8B-Instruct-SEALONG-Dataset")
dataset.save_to_disk(/path/to/your/save_dir)

Fine-tuning

Set MODEL_NAME_OR_PATH and DATASET in the scripts before fine-tuning.

ORPO:

# QLoRA
bash scripts/finetune_orpo_qlora_xtuner.sh
# Full-parameter
bash scripts/finetune_orpo_xtuner.sh

SFT:

You may also opt for SFT; however, our findings indicate that ORPO achieves superior performance (see Table 5 in our paper).

# QLoRA
bash scripts/finetune_sft_qlora_xtuner.sh
# Full-parameter
bash scripts/finetune_sft_xtuner.sh

In our experiments, we select QLoRA for memory efficiency, but we also test full parameter training. We observe that a learning rate of 5e-6 yields decent performance when using ORPO with full parameter training.

📑 Citation

If SEALONG is useful for your research or applications, please cite it with the following BibTeX:

@article{li2024large,
  title={Large Language Models Can Self-Improve in Long-context Reasoning},
  author={Li, Siheng and Yang, Cheng and Cheng, Zesen and Liu, Lemao and Yu, Mo and Yang, Yujiu and Lam, Wai},
  journal={arXiv preprint arXiv:2411.08147},
  year={2024}
}

👍 Acknowledgement

We gratefully acknowledge the following projects that SEALONG builds upon:

About

Large Language Models Can Self-Improve in Long-context Reasoning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published