You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am encountering issues with the last three parameters in the DPOTrainer configuration. The problem seems related to the version I am using. To work around the issue and proceed with fine-tuning, I had to comment out the following parameters:
trainer = DPOTrainer( # The model to be trained model=model, # Training configuration from above args=training_args, # Dataset containing preferred/rejected response pairs train_dataset=dataset, # Tokenizer for processing inputs processing_class=tokenizer, # DPO-specific temperature parameter that controls the strength of the preference model # Lower values (like 0.1) make the model more conservative in following preferences #beta=0.1, # Maximum length of the input prompt in tokens #max_prompt_length=1024, # Maximum combined length of prompt + response in tokens #max_length=1536, )
Will omitting these parameters (i.e., commenting them out) affect the quality or performance of the fine-tuned model?
After some quick research, I discovered that in the latest version of TRL (0.13.0), these parameters have been moved to the DPOConfig class.
To resolve the issue, the parameters causing errors need to be moved to the DPOConfig definition.
config = DPOConfig( ... beta=0.1, # DPO-specific temperature parameter max_prompt_length=1024, # Maximum length of the input prompt in tokens max_length=1536 # Maximum combined length of prompt + response in tokens )
I am encountering issues with the last three parameters in the DPOTrainer configuration. The problem seems related to the version I am using. To work around the issue and proceed with fine-tuning, I had to comment out the following parameters:
trainer = DPOTrainer( # The model to be trained model=model, # Training configuration from above args=training_args, # Dataset containing preferred/rejected response pairs train_dataset=dataset, # Tokenizer for processing inputs processing_class=tokenizer, # DPO-specific temperature parameter that controls the strength of the preference model # Lower values (like 0.1) make the model more conservative in following preferences #beta=0.1, # Maximum length of the input prompt in tokens #max_prompt_length=1024, # Maximum combined length of prompt + response in tokens #max_length=1536, )
Will omitting these parameters (i.e., commenting them out) affect the quality or performance of the fine-tuned model?
https://colab.research.google.com/github/huggingface/smol-course/blob/main/2_preference_alignment/notebooks/dpo_finetuning_example.ipynb#scrollTo=eIIzGIKZtKcq&line=1&uniqifier=1
The text was updated successfully, but these errors were encountered: