-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for SFT Training scripts and implementation details #71
Comments
Hi @HCY123902 We used the same SFT training script as the original alignment-handbook repo: https://github.com/huggingface/alignment-handbook/blob/main/scripts/run_sft.py And the command for SFT training is as follows:
As for HuggingFaceH4/ultrachat_200k, we didn't do any specific processing of that. This means we train on all turns if the dialogue is multi-turn. I hope this helps! Best, |
Thank you for clarifying this |
Hi, following up on this, may I kindly ask if it's possible to provide a separate file for SFT training? The
as provided above not runnable. |
Hi @OscarXZQ Sorry there was a typo in my previous comment: Best, |
Thank you for sharing your research work. I have a question related to the supervised fine-tuning step, which, according to the paper, is used to initialize the
base
model before running SimPO. While the SFT configuration file is provided attraining_configs/llama-3-8b-base-sft.yaml
, may I ask for the SFT training script itself?In issue #27, there is 1 comment asking about how
HuggingFaceH4/ultrachat_200k
is processed for SFT. I would like to know this too.HuggingFaceH4/ultrachat_200k
samples are multi-turn dialogues. Therefore, I am curious about what labels are used for SFT.The text was updated successfully, but these errors were encountered: