issue with DPOTrainer class #131

sorydi3 · 2024-12-20T11:55:38Z

I am encountering issues with the last three parameters in the DPOTrainer configuration. The problem seems related to the version I am using. To work around the issue and proceed with fine-tuning, I had to comment out the following parameters:

trainer = DPOTrainer( # The model to be trained model=model, # Training configuration from above args=training_args, # Dataset containing preferred/rejected response pairs train_dataset=dataset, # Tokenizer for processing inputs processing_class=tokenizer, # DPO-specific temperature parameter that controls the strength of the preference model # Lower values (like 0.1) make the model more conservative in following preferences #beta=0.1, # Maximum length of the input prompt in tokens #max_prompt_length=1024, # Maximum combined length of prompt + response in tokens #max_length=1536, )

Will omitting these parameters (i.e., commenting them out) affect the quality or performance of the fine-tuned model?

https://colab.research.google.com/github/huggingface/smol-course/blob/main/2_preference_alignment/notebooks/dpo_finetuning_example.ipynb#scrollTo=eIIzGIKZtKcq&line=1&uniqifier=1

The text was updated successfully, but these errors were encountered:

Xe · 2024-12-20T20:28:26Z

I ran into this on stream today. Which version of TRL should I use?

Danselem · 2024-12-21T18:27:55Z

I had a similar issue on Google Colab. A workaround is to install trl==0.12.1 instead of trl on Colab.

sorydi3 · 2024-12-22T11:37:45Z

After some quick research, I discovered that in the latest version of TRL (0.13.0), these parameters have been moved to the DPOConfig class.

To resolve the issue, the parameters causing errors need to be moved to the DPOConfig definition.

config = DPOConfig( ... beta=0.1, # DPO-specific temperature parameter max_prompt_length=1024, # Maximum length of the input prompt in tokens max_length=1536 # Maximum combined length of prompt + response in tokens )

https://github.com/huggingface/trl/blob/4c71daf461d86226ee36c30531d481c33c3e618e/trl/trainer/dpo_trainer.py#L149

sorydi3 mentioned this issue Dec 22, 2024

fix: remove depregated arguments from DPOtrainer #138

Open

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue with DPOTrainer class #131

issue with DPOTrainer class #131

sorydi3 commented Dec 20, 2024 •

edited

Loading

Xe commented Dec 20, 2024

Danselem commented Dec 21, 2024

sorydi3 commented Dec 22, 2024

issue with DPOTrainer class #131

issue with DPOTrainer class #131

Comments

sorydi3 commented Dec 20, 2024 • edited Loading

Xe commented Dec 20, 2024

Danselem commented Dec 21, 2024

sorydi3 commented Dec 22, 2024

sorydi3 commented Dec 20, 2024 •

edited

Loading