-
-
Notifications
You must be signed in to change notification settings - Fork 899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support user-defined prompt processing strategies for dpo #1248
Conversation
this is great so far. For consistency, It would be great if the structure matched the SFT format: datasets:
- path: repo
type:
system_prompt: ""
field_system: system
field_instruction: input
field_output: output
format: "[INST] {instruction} [/INST]"
no_input_format: "[INST] {instruction} [/INST]" |
Ah, I missed that. I have adapted it, now it expects the following config: datasets:
- path: repo
split: train
type:
field_system: sys
field_prompt: question
field_chosen: correct
field_rejected: wrong
prompt_format: "<|im_start|>system\n{system}<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\nThe answer in JSON format is:"
chosen_format: "{chosen}<|im_end|>"
rejected_format: "{rejected}<|im_end|>" |
@winglian I've fixed the build errors, I think this can be merged now. |
ec5d853
to
36397ab
Compare
@nopperl I rebased this and updated the pydantic models for validation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, @NanoCode012 ?
* support user-defined prompt processing strategies for dpo * interpret dict dataset types as user-defined * fix lint errors * setup pydantic config for validation of User defined DPO --------- Co-authored-by: Wing Lian <[email protected]>
Description
Support user-defined prompt pre-processing strategies for DPO training similar to instruction finetuning (in
src/axolotl/prompt_strategies/user_defined.py
). Specifically, it supports defining different names and a format string for theprompt
,chosen
andreject
fields. Example config:How has this been tested?
Tested it on a custom dataset, might make sense to add test cases.
Types of changes