How to fine-tune a distilled model (DeiT)? #1057

manuel-rdz · 2021-12-24T07:21:09Z

manuel-rdz
Dec 24, 2021

Hello
I'm new to distilled models and I was trying to use DeiT model with 20 outputs
Usually what I do is just change the head of the transformer models with a linear layer
But I noticed that distilled models (at least DeiT) have 2 heads, 'head' and 'head_dist' and return a tuple of 2 sets of logits while doing the prediction (I assume they come from the 2 heads)
So my question is:
What is the correct way to replace the heads to fine-tune it? (replace both, only one, etc)
How to use the 2 returned set of logits to calculate the loss? (use only one set for the loss, a combination of both, etc)
Thank you :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to fine-tune a distilled model (DeiT)? #1057

{{title}}

Replies: 0 comments

Select a reply

How to fine-tune a distilled model (DeiT)? #1057

manuel-rdz Dec 24, 2021

Replies: 0 comments

manuel-rdz
Dec 24, 2021