Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORPO #1419

Merged
merged 12 commits into from
Mar 18, 2024
Merged

ORPO #1419

merged 12 commits into from
Mar 18, 2024

Conversation

winglian
Copy link
Collaborator

@winglian winglian commented Mar 18, 2024

Paper: https://arxiv.org/abs/2403.07691 , authors: Jiwoo Hong, Noah Lee, and James Thorne
Adapted from https://github.com/xfactlab/orpo

@jiwooya1000

test using mistral and the argilla ultrafeedback binarized dataset
Screenshot 2024-03-18 at 9 03 26 AM

the log odds ratio is increasing, which matches with the paper:

We wrap the log odds ratio with the log sig-
moid function so that LOR could be minimized by
increasing the log odds ratio between yw and yl.

@winglian winglian merged commit 2ea70eb into main Mar 18, 2024
6 of 7 checks passed
@winglian winglian deleted the orpo branch March 18, 2024 17:10
seungduk-yanolja pushed a commit to Y-IAB/axolotl that referenced this pull request Mar 19, 2024
* orpo trainer

* rl handling for orpo

* support for remove_unused_columns

* orpo fixes

* fix loader for orpo

* chore: lint

* fix default for remove_unused_columns

* roll ORPO into the main AxolotlTrainer so it can be compatible with some of the other techniques like relora

* better handling of system message for orpo

* revert system prompt changes for chat templtes

* no need for else condition

* split dataset parsing into it's own component
djsaunde pushed a commit that referenced this pull request Dec 17, 2024
* orpo trainer

* rl handling for orpo

* support for remove_unused_columns

* orpo fixes

* fix loader for orpo

* chore: lint

* fix default for remove_unused_columns

* roll ORPO into the main AxolotlTrainer so it can be compatible with some of the other techniques like relora

* better handling of system message for orpo

* revert system prompt changes for chat templtes

* no need for else condition

* split dataset parsing into it's own component
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant