Code for paper : LIRE: listwise reward enhancement for preference alignment (Accepted by ACL2024 findings)
The code base is built upon the RRHF paper, please refer to it for setting up the environment and generating training data. We include SFT loss, RRHF loss, Slic Loss, DPO loss, and Lire loss in the code for quick and easy use. Please modify the hyperparameter settings and other customized settings accordingly.