What's the correct way to use Chat preference datasets for DPO? #1400

abhinand5 · 2024-03-13T02:29:55Z

What piece of documentation is affected?

https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/docs/rlhf.md

What part(s) of the article would you like to see updated?

Loading the datasets part mainly assumes that prompt, chosen and rejected fields are strings but recently there are a lot of popular datasets like argilla/dpo-mix-7k which contain dicts/JSON, which then needs to be tokenized.

I'm assuming we have to run tokenizer.apply_chat_template on these fields and then use a custom dataset type to train the model.

rl: dpo
datasets:
  - path: dpo-mix-formatted.jsonl
    ds_type: json
    split: train
    type:
        field_prompt: prompt
        field_chosen: chosen
        field_rejected: rejected

The training is running currently, but I am not sure if this is the correct way to do this. Can anyone confirm?

Additional Information

No response

Acknowledgements

My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this feature has not been requested yet.
I have provided enough information for the maintainers to understand and evaluate this request.

The text was updated successfully, but these errors were encountered:

NanoCode012 · 2024-03-13T11:18:37Z

You can do preprocess with debug (see readme) to see if it's tokenizing right .

abhinand5 · 2024-03-13T17:40:29Z

Thanks for the suggestion. @NanoCode012.

It doesn't print anything for the above configuration! (it does print for my SFT dataset btw) Need to check what's going wrong.

abhinand5 · 2024-03-14T01:31:22Z

Well, it doesn't print debug information for DPO datasets. I tried a few different datasets. Maybe I'll work on this and raise a PR.

abhinand5 · 2024-03-15T05:19:02Z

Alright, this issue was fixed in #1397.

I've just added the option to debug RL datasets in preprocess and raised a PR (#1404). Closing this!

abhinand5 mentioned this issue Mar 15, 2024

Add debug option for RL dataset preprocessing #1404

Merged

abhinand5 closed this as completed Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's the correct way to use Chat preference datasets for DPO? #1400

What's the correct way to use Chat preference datasets for DPO? #1400

abhinand5 commented Mar 13, 2024 •

edited

Loading

NanoCode012 commented Mar 13, 2024

abhinand5 commented Mar 13, 2024 •

edited

Loading

abhinand5 commented Mar 14, 2024

abhinand5 commented Mar 15, 2024

What's the correct way to use Chat preference datasets for DPO? #1400

What's the correct way to use Chat preference datasets for DPO? #1400

Comments

abhinand5 commented Mar 13, 2024 • edited Loading

What piece of documentation is affected?

What part(s) of the article would you like to see updated?

Additional Information

Acknowledgements

NanoCode012 commented Mar 13, 2024

abhinand5 commented Mar 13, 2024 • edited Loading

abhinand5 commented Mar 14, 2024

abhinand5 commented Mar 15, 2024

abhinand5 commented Mar 13, 2024 •

edited

Loading

abhinand5 commented Mar 13, 2024 •

edited

Loading