Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the correct way to use Chat preference datasets for DPO? #1400

Closed
3 tasks done
abhinand5 opened this issue Mar 13, 2024 · 4 comments
Closed
3 tasks done

What's the correct way to use Chat preference datasets for DPO? #1400

abhinand5 opened this issue Mar 13, 2024 · 4 comments

Comments

@abhinand5
Copy link
Contributor

abhinand5 commented Mar 13, 2024

What piece of documentation is affected?

https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/docs/rlhf.md

What part(s) of the article would you like to see updated?

Loading the datasets part mainly assumes that prompt, chosen and rejected fields are strings but recently there are a lot of popular datasets like argilla/dpo-mix-7k which contain dicts/JSON, which then needs to be tokenized.

I'm assuming we have to run tokenizer.apply_chat_template on these fields and then use a custom dataset type to train the model.

rl: dpo
datasets:
  - path: dpo-mix-formatted.jsonl
    ds_type: json
    split: train
    type:
        field_prompt: prompt
        field_chosen: chosen
        field_rejected: rejected

The training is running currently, but I am not sure if this is the correct way to do this. Can anyone confirm?

Additional Information

No response

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this feature has not been requested yet.
  • I have provided enough information for the maintainers to understand and evaluate this request.
@NanoCode012
Copy link
Collaborator

You can do preprocess with debug (see readme) to see if it's tokenizing right .

@abhinand5
Copy link
Contributor Author

abhinand5 commented Mar 13, 2024

Thanks for the suggestion. @NanoCode012.

It doesn't print anything for the above configuration! (it does print for my SFT dataset btw) Need to check what's going wrong.

@abhinand5
Copy link
Contributor Author

Well, it doesn't print debug information for DPO datasets. I tried a few different datasets. Maybe I'll work on this and raise a PR.

@abhinand5
Copy link
Contributor Author

Alright, this issue was fixed in #1397.

I've just added the option to debug RL datasets in preprocess and raised a PR (#1404). Closing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants