Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support user-defined prompt processing strategies for dpo #1248

Merged
merged 4 commits into from
Feb 26, 2024

Conversation

nopperl
Copy link
Contributor

@nopperl nopperl commented Feb 2, 2024

Description

Support user-defined prompt pre-processing strategies for DPO training similar to instruction finetuning (in src/axolotl/prompt_strategies/user_defined.py). Specifically, it supports defining different names and a format string for the prompt, chosen and reject fields. Example config:

datasets:
  - ds_type: json
    data_files:
      - test_dpo.jsonl
    split: train
    type: user_defined.default
    field_system: sys
    field_prompt: question
    field_chosen: correct
    field_rejected: wrong
    prompt_format: "<|im_start|>system\n{system}<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\nThe answer in JSON format is:"
    chosen_format: "{chosen}<|im_end|>"
    rejected_format: "{rejected}<|im_end|>"

How has this been tested?

Tested it on a custom dataset, might make sense to add test cases.

Types of changes

@winglian
Copy link
Collaborator

winglian commented Feb 6, 2024

this is great so far. For consistency, It would be great if the structure matched the SFT format:
https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/README.md?plain=1#L388-L400

datasets:
  - path: repo
    type:
      system_prompt: ""
      field_system: system
      field_instruction: input
      field_output: output
      format: "[INST] {instruction} [/INST]"
      no_input_format: "[INST] {instruction} [/INST]"

@nopperl
Copy link
Contributor Author

nopperl commented Feb 7, 2024

Ah, I missed that. I have adapted it, now it expects the following config:

datasets:
  - path: repo
    split: train
    type:
      field_system: sys
      field_prompt: question
      field_chosen: correct
      field_rejected: wrong
      prompt_format: "<|im_start|>system\n{system}<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\nThe answer in JSON format is:"
      chosen_format: "{chosen}<|im_end|>"
      rejected_format: "{rejected}<|im_end|>"

@nopperl
Copy link
Contributor Author

nopperl commented Feb 16, 2024

@winglian I've fixed the build errors, I think this can be merged now.

@winglian
Copy link
Collaborator

@nopperl I rebased this and updated the pydantic models for validation.

Copy link
Collaborator

@winglian winglian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, @NanoCode012 ?

@winglian winglian merged commit 1e3d530 into axolotl-ai-cloud:main Feb 26, 2024
7 checks passed
djsaunde pushed a commit that referenced this pull request Dec 17, 2024
* support user-defined prompt processing strategies for dpo

* interpret dict dataset types as user-defined

* fix lint errors

* setup pydantic config for validation of User defined DPO

---------

Co-authored-by: Wing Lian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants