GitHub - stevie1023/LIRE: Code for paper : LIRE: listwise reward enhancement for preference alignment (Accepted by ACL2024 findings)

Code for paper : LIRE: listwise reward enhancement for preference alignment (Accepted by ACL2024 findings)

The code base is built upon the RRHF paper, please refer to it for setting up the environment and generating training data. We include SFT loss, RRHF loss, Slic Loss, DPO loss, and Lire loss in the code for quick and easy use. Please modify the hyperparameter settings and other customized settings accordingly.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data_generation		data_generation
README.md		README.md
default_offload_opt_param.json		default_offload_opt_param.json
train.py		train.py
train.sh		train.sh
train_alpaca.sh		train_alpaca.sh
train_alpaca_prompt.py		train_alpaca_prompt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code for paper : LIRE: listwise reward enhancement for preference alignment (Accepted by ACL2024 findings)

About

Releases

Packages

Languages

stevie1023/LIRE

Folders and files

Latest commit

History

Repository files navigation

Code for paper : LIRE: listwise reward enhancement for preference alignment (Accepted by ACL2024 findings)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages