Support user-defined prompt processing strategies for dpo #1248

nopperl · 2024-02-02T12:32:13Z

Description

Support user-defined prompt pre-processing strategies for DPO training similar to instruction finetuning (in src/axolotl/prompt_strategies/user_defined.py). Specifically, it supports defining different names and a format string for the prompt, chosen and reject fields. Example config:

datasets:
  - ds_type: json
    data_files:
      - test_dpo.jsonl
    split: train
    type: user_defined.default
    field_system: sys
    field_prompt: question
    field_chosen: correct
    field_rejected: wrong
    prompt_format: "<|im_start|>system\n{system}<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\nThe answer in JSON format is:"
    chosen_format: "{chosen}<|im_end|>"
    rejected_format: "{rejected}<|im_end|>"

How has this been tested?

Tested it on a custom dataset, might make sense to add test cases.

Types of changes

winglian · 2024-02-06T05:46:53Z

this is great so far. For consistency, It would be great if the structure matched the SFT format:
https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/README.md?plain=1#L388-L400

datasets:
  - path: repo
    type:
      system_prompt: ""
      field_system: system
      field_instruction: input
      field_output: output
      format: "[INST] {instruction} [/INST]"
      no_input_format: "[INST] {instruction} [/INST]"

nopperl · 2024-02-07T12:46:59Z

Ah, I missed that. I have adapted it, now it expects the following config:

datasets:
  - path: repo
    split: train
    type:
      field_system: sys
      field_prompt: question
      field_chosen: correct
      field_rejected: wrong
      prompt_format: "<|im_start|>system\n{system}<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\nThe answer in JSON format is:"
      chosen_format: "{chosen}<|im_end|>"
      rejected_format: "{rejected}<|im_end|>"

nopperl · 2024-02-16T18:02:03Z

@winglian I've fixed the build errors, I think this can be merged now.

winglian · 2024-02-26T17:41:29Z

@nopperl I rebased this and updated the pydantic models for validation.

winglian

lgtm, @NanoCode012 ?

* support user-defined prompt processing strategies for dpo * interpret dict dataset types as user-defined * fix lint errors * setup pydantic config for validation of User defined DPO --------- Co-authored-by: Wing Lian <[email protected]>

nopperl added 3 commits February 26, 2024 12:37

support user-defined prompt processing strategies for dpo

24a7349

interpret dict dataset types as user-defined

a057076

fix lint errors

36397ab

winglian force-pushed the dpo-user-defined branch from ec5d853 to 36397ab Compare February 26, 2024 17:37

winglian approved these changes Feb 26, 2024

View reviewed changes

setup pydantic config for validation of User defined DPO

83c0cee

winglian approved these changes Feb 26, 2024

View reviewed changes

winglian added the ready to merge label Feb 26, 2024

winglian merged commit 1e3d530 into axolotl-ai-cloud:main Feb 26, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support user-defined prompt processing strategies for dpo #1248

Support user-defined prompt processing strategies for dpo #1248

nopperl commented Feb 2, 2024

winglian commented Feb 6, 2024

nopperl commented Feb 7, 2024 •

edited

Loading

nopperl commented Feb 16, 2024

winglian commented Feb 26, 2024

winglian left a comment

Support user-defined prompt processing strategies for dpo #1248

Support user-defined prompt processing strategies for dpo #1248

Conversation

nopperl commented Feb 2, 2024

Description

How has this been tested?

Types of changes

winglian commented Feb 6, 2024

nopperl commented Feb 7, 2024 • edited Loading

nopperl commented Feb 16, 2024

winglian commented Feb 26, 2024

winglian left a comment

Choose a reason for hiding this comment

nopperl commented Feb 7, 2024 •

edited

Loading