Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for SFT Training scripts and implementation details #71

Open
HCY123902 opened this issue Oct 19, 2024 · 4 comments
Open

Request for SFT Training scripts and implementation details #71

HCY123902 opened this issue Oct 19, 2024 · 4 comments

Comments

@HCY123902
Copy link

HCY123902 commented Oct 19, 2024

Thank you for sharing your research work. I have a question related to the supervised fine-tuning step, which, according to the paper, is used to initialize the base model before running SimPO. While the SFT configuration file is provided at training_configs/llama-3-8b-base-sft.yaml, may I ask for the SFT training script itself?

In issue #27, there is 1 comment asking about how HuggingFaceH4/ultrachat_200k is processed for SFT. I would like to know this too. HuggingFaceH4/ultrachat_200k samples are multi-turn dialogues. Therefore, I am curious about what labels are used for SFT.

@HCY123902 HCY123902 changed the title Training scripts for SFT Request for SFT Training scripts and implementation details Oct 19, 2024
@yumeng5
Copy link
Collaborator

yumeng5 commented Oct 19, 2024

Hi @HCY123902

We used the same SFT training script as the original alignment-handbook repo: https://github.com/huggingface/alignment-handbook/blob/main/scripts/run_sft.py

And the command for SFT training is as follows:

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/run_sft.py training_configs/llama-3-8b-base-sft.yaml

As for HuggingFaceH4/ultrachat_200k, we didn't do any specific processing of that. This means we train on all turns if the dialogue is multi-turn.

I hope this helps!

Best,
Yu

@HCY123902
Copy link
Author

Thank you for clarifying this

@OscarXZQ
Copy link

OscarXZQ commented Nov 1, 2024

Hi, following up on this, may I kindly ask if it's possible to provide a separate file for SFT training? The run_simpo script seems to have hardcoded some preference optimization stage, making the command

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/run_simpo.py training_configs/llama-3-8b-base-sft.yaml

as provided above not runnable.

@yumeng5
Copy link
Collaborator

yumeng5 commented Nov 1, 2024

Hi @OscarXZQ

Sorry there was a typo in my previous comment: run_simpo.py should be replaced with run_sft.py (which is the script in the original alignment-handbook repo). I have fixed the typo.

Best,
Yu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants