You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
bugSomething isn't workingpendingThis problem is yet to be addressed
1 participant
Converted from issue
This discussion was converted from issue #6679 on January 17, 2025 04:05.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Reminder
System Info
利用模型生成的多次输出,并基于大模型完成优和差的筛选对比构建了dpo训练数据,训练后发现出现了重复(特别是训练数据出现的重复更多),看了看loss感觉不能保证模型安按照target进行输出。是否应该改进下目前的loss 例如 simpo orpo sigmoid 这些loss我感觉应该都加一个sft(ce loss)来保证输出不要偏离了
Reproduction
Others
No response
Beta Was this translation helpful? Give feedback.
All reactions