Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FOAK Cross Entropy Loss Will Not Work with New Loss Functions After Transformers 4.46 #98

Open
fabianlim opened this issue Oct 29, 2024 · 1 comment
Assignees
Labels
future Will be affected in future versions (e.g., deprecation) urgent Time sensitivity involved.

Comments

@fabianlim
Copy link
Contributor

fabianlim commented Oct 29, 2024

There has been significant refactoring of the loss functions for transformers 4.46, that will render the cross entropy patching ineffective. Need to have a different ModelPatcherRule for the new transformers version. CC: @anhuong

huggingface/transformers#34191

So now there are 3 possiblities

  1. custom_loss_function is passed into Trainer
  2. model has migrated to the custom_loss_function API
  3. model has not migrated (like Granite now)

For 3. This is the easy one, because it means no code changes

For 1. Im thinking we do not patch anything, because if a user wants to do this, we cant control what loss function they use

For 2. In this case we want to patch fixed_cross_entropy , but this should be done on a per-model basis. So we need to somehow have the model instantiate the loss function, e.g., ForCausalLMLoss, and only patch fixed_cross_entropy during this instantiation process, and put it back to original after it is done

@fabianlim fabianlim changed the title FOAK Cross Entropy Loss Will Not Work with New Loss Functions FOAK Cross Entropy Loss Will Not Work with New Loss Functions After Transformers 4.46 Oct 29, 2024
@fabianlim
Copy link
Contributor Author

@anhuong should we try to resolve this before coming back to #93?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
future Will be affected in future versions (e.g., deprecation) urgent Time sensitivity involved.
Projects
None yet
Development

No branches or pull requests

2 participants