Whether make preference optimization training on base model directly? #4

AIR-hl · 2024-11-11T06:20:37Z

Hi! I noticed that this method seems to optimize preferences directly on the base model rather than the SFT model, as far as I know the base does not have the intruct-following ability, so preference optimization methods such as DPO need to undergo sft training first.

My question is: the experimental setting in this article is reasonable?

jadeCurl · 2024-11-27T09:15:03Z

Hi,

We also tried the setting of conducting SFT first, and then doing DPO. Please refer to Table 3 in the Appendix for detailed results regarding this setting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whether make preference optimization training on base model directly? #4

Whether make preference optimization training on base model directly? #4

AIR-hl commented Nov 11, 2024 •

edited

Loading

jadeCurl commented Nov 27, 2024

Whether make preference optimization training on base model directly? #4

Whether make preference optimization training on base model directly? #4

Comments

AIR-hl commented Nov 11, 2024 • edited Loading

jadeCurl commented Nov 27, 2024

AIR-hl commented Nov 11, 2024 •

edited

Loading