- fix(Sign): self-graft correctly; previously, we did
update.sign() * update.norm()
, omitting the required division by the original norm. Now, it'sF.normalize(update.sign()) * update.norm()
. This changes the required learning rates for self-graftedtg.optim.Sign
.