How to fine-tune a distilled model (DeiT)? #1057
Unanswered
manuel-rdz
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello
I'm new to distilled models and I was trying to use DeiT model with 20 outputs
Usually what I do is just change the head of the transformer models with a linear layer
But I noticed that distilled models (at least DeiT) have 2 heads, 'head' and 'head_dist' and return a tuple of 2 sets of logits while doing the prediction (I assume they come from the 2 heads)
So my question is:
What is the correct way to replace the heads to fine-tune it? (replace both, only one, etc)
How to use the 2 returned set of logits to calculate the loss? (use only one set for the loss, a combination of both, etc)
Thank you :)
Beta Was this translation helpful? Give feedback.
All reactions