You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The DeepIce model contains a method called no_weight_decay() which is intended to specify that the cls_token parameter should not be subject to weight decay during training:
@torch.jit.ignore
def no_weight_decay(self) -> Set:
"""cls_tocken should not be subject to weight decay during training."""
return {"cls_token"}
However, optimizer_grouped_parameters are not specified during training, so this method has no effect.
I believe that in the original 2nd place code, FastAI's wrapper around AdamW handled this automatically.
The text was updated successfully, but these errors were encountered:
The
DeepIce
model contains a method calledno_weight_decay()
which is intended to specify that thecls_token
parameter should not be subject to weight decay during training:However,
optimizer_grouped_parameters
are not specified during training, so this method has no effect.I believe that in the original 2nd place code, FastAI's wrapper around AdamW handled this automatically.
The text was updated successfully, but these errors were encountered: