Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue with DPOTrainer class #131

Open
sorydi3 opened this issue Dec 20, 2024 · 3 comments
Open

issue with DPOTrainer class #131

sorydi3 opened this issue Dec 20, 2024 · 3 comments

Comments

@sorydi3
Copy link

sorydi3 commented Dec 20, 2024

I am encountering issues with the last three parameters in the DPOTrainer configuration. The problem seems related to the version I am using. To work around the issue and proceed with fine-tuning, I had to comment out the following parameters:

trainer = DPOTrainer( # The model to be trained model=model, # Training configuration from above args=training_args, # Dataset containing preferred/rejected response pairs train_dataset=dataset, # Tokenizer for processing inputs processing_class=tokenizer, # DPO-specific temperature parameter that controls the strength of the preference model # Lower values (like 0.1) make the model more conservative in following preferences #beta=0.1, # Maximum length of the input prompt in tokens #max_prompt_length=1024, # Maximum combined length of prompt + response in tokens #max_length=1536, )

Will omitting these parameters (i.e., commenting them out) affect the quality or performance of the fine-tuned model?

https://colab.research.google.com/github/huggingface/smol-course/blob/main/2_preference_alignment/notebooks/dpo_finetuning_example.ipynb#scrollTo=eIIzGIKZtKcq&line=1&uniqifier=1

@Xe
Copy link

Xe commented Dec 20, 2024

I ran into this on stream today. Which version of TRL should I use?

@Danselem
Copy link

I had a similar issue on Google Colab. A workaround is to install trl==0.12.1 instead of trl on Colab.

@sorydi3
Copy link
Author

sorydi3 commented Dec 22, 2024

After some quick research, I discovered that in the latest version of TRL (0.13.0), these parameters have been moved to the DPOConfig class.

To resolve the issue, the parameters causing errors need to be moved to the DPOConfig definition.

config = DPOConfig( ... beta=0.1, # DPO-specific temperature parameter max_prompt_length=1024, # Maximum length of the input prompt in tokens max_length=1536 # Maximum combined length of prompt + response in tokens )

https://github.com/huggingface/trl/blob/4c71daf461d86226ee36c30531d481c33c3e618e/trl/trainer/dpo_trainer.py#L149

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants