Releases: ClashLuke/TrueGrad
Releases · ClashLuke/TrueGrad
4.0.3
- fix(Sign): self-graft correctly; previously, we did
update.sign() * update.norm()
, omitting the required division by the original norm. Now, it's F.normalize(update.sign()) * update.norm()
. This changes the required learning rates for self-grafted tg.optim.Sign
.
4.0.2
- Use WeightDecayChain in OptimizerOptimizer
4.0.1
- Add missing
params_flat
in Graft
4.0.0
- Add configurable weight decay via
WeightDecayChain
- L1/L2 Decay
- Decay to Init/EMA
- Remove
decay_to_init
flag. Use weight_decay_cls=tg.WeightDecayChain(tg.optim.WeightDecayToInit())
instead.
- Remove
default_to_adam
flag. Use default_to_baseline
.
2.2.0
- Improve TG-Optimizer extensibility by adding
TrueGrad
base optimizer class
- Add (TG-)LaProp
2.1.0
- feat(nn.functional): allow parameters in more
truegrad.nn.functional
ops
- fix(functional): allow odd shapes in
truegrad.functional.einsum
's backward
- feat(utils): allow the combination of
truegrad.nn
with truegrad.utils.patch_model
- fix(TGAdamW): improve stability
Together, these features allow performant usage of off-the-shelf HuggingFace Transformers using truegrad.utils.patch_torch
.
2.0.0
- Feature: Patch
torch
and torch.nn.functional
in truegrad.utils.patch_torch
- Feature: Add chunk, split and transpose to
truegrad.functional
- Fix: publicly expose
truegrad.nn.functional
- Fix: use patched chunk, split and transpose functions in
truegrad.nn.functional.multi_head_attention_forward
(closes #1)
1.0.0
- Add
truegrad.nn.functional
- Extend
truegrad.nn
- Add
truegrad.utils.patch_torch
- Add
truegrad.functional.TrueGradTensor
to store sum_grad_squared
(-> fixed truegrad.functional.reshape
)
0.1.0
- Add BackPack as possible backend
- default_to_adam option for TGAdamW
- rename square_grad to sum_grad_squared