You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I just read your paper of DMT and quite appreciate your work. But I can't fully understande the statement in the paper:"It can be interpreted that a relatively larger γ1 represents a more emphasized entropy minimization, a larger γ2 represents a more emphasized mutual learning. Largeγ values are often better for high-noise scenarios, or to maintain larger intermodel disagreement." Could you please explain it? Thanks a lot!
The text was updated successfully, but these errors were encountered:
@Hugo-cell111 FYI, larger γ corresponds to larger differences in loss weighting. Since loss weighting is the core of the dynamic loss, hereby the use of the expression "emphasize".
γ1 is used when models predict the same label, which corresponds to entropy minimization.
γ2 is used when models predict different labels, which corresponds to mutual learning.
As for the last statement on high-noise and disagreement, it is more empirical. You can understand it as the effects of a overall low learning rate (although not exactly so considering the exponential dynamic weight), the models won't make large steps towards noisy labels or each other.
Hi! I just read your paper of DMT and quite appreciate your work. But I can't fully understande the statement in the paper:"It can be interpreted that a relatively larger γ1 represents a more emphasized entropy minimization, a larger γ2 represents a more emphasized mutual learning. Largeγ values are often better for high-noise scenarios, or to maintain larger intermodel disagreement." Could you please explain it? Thanks a lot!
The text was updated successfully, but these errors were encountered: