diff --git a/README.md b/README.md index 051c8606a..53166456a 100644 --- a/README.md +++ b/README.md @@ -12,6 +12,20 @@ ## What's New +## Nov 28, 2024 +* More optimizers + * Add MARS optimizer (https://arxiv.org/abs/2411.10438, https://github.com/AGI-Arena/MARS) + * Add LaProp optimizer (https://arxiv.org/abs/2002.04839, https://github.com/Z-T-WANG/LaProp-Optimizer) + * Add masking from 'Cautious Optimizers' (https://arxiv.org/abs/2411.16085, https://github.com/kyleliang919/C-Optim) to Adafactor, Adafactor Big Vision, AdamW (legacy), Adopt, Lamb, LaProp, Lion, NadamW, RMSPropTF, SGDW + * Cleanup some docstrings and type annotations re optimizers and factory +* Add MobileNet-V4 Conv Medium models pretrained on in12k and fine-tuned in1k @ 384x384 + * https://huggingface.co/timm/mobilenetv4_conv_medium.e250_r384_in12k_ft_in1k + * https://huggingface.co/timm/mobilenetv4_conv_medium.e250_r384_in12k + * https://huggingface.co/timm/mobilenetv4_conv_medium.e180_ad_r384_in12k + * https://huggingface.co/timm/mobilenetv4_conv_medium.e180_r384_in12k +* Add small cs3darknet, quite good for the speed + * https://huggingface.co/timm/cs3darknet_focus_s.ra4_e3600_r256_in1k + ## Nov 12, 2024 * Optimizer factory refactor * New factory works by registering optimizers using an OptimInfo dataclass w/ some key traits @@ -463,12 +477,14 @@ Included optimizers available via `timm.optim.create_optimizer_v2` factory metho * `adahessian` by [David Samuel](https://github.com/davda54/ada-hessian) - https://arxiv.org/abs/2006.00719 * `adamp` and `sgdp` by [Naver ClovAI](https://github.com/clovaai) - https://arxiv.org/abs/2006.08217 * `adan` an implementation of Adan adapted from https://github.com/sail-sg/Adan - https://arxiv.org/abs/2208.06677 -* `adopt` - adapted from https://github.com/iShohei220/adopt - https://arxiv.org/abs/2411.02853 +* `adopt` ADOPT adapted from https://github.com/iShohei220/adopt - https://arxiv.org/abs/2411.02853 * `lamb` an implementation of Lamb and LambC (w/ trust-clipping) cleaned up and modified to support use with XLA - https://arxiv.org/abs/1904.00962 +* `laprop` optimizer from https://github.com/Z-T-WANG/LaProp-Optimizer - https://arxiv.org/abs/2002.04839 * `lars` an implementation of LARS and LARC (w/ trust-clipping) - https://arxiv.org/abs/1708.03888 * `lion` and implementation of Lion adapted from https://github.com/google/automl/tree/master/lion - https://arxiv.org/abs/2302.06675 * `lookahead` adapted from impl by [Liam](https://github.com/alphadl/lookahead.pytorch) - https://arxiv.org/abs/1907.08610 -* `madgrad` - and implementation of MADGRAD adapted from https://github.com/facebookresearch/madgrad - https://arxiv.org/abs/2101.11075 +* `madgrad` an implementation of MADGRAD adapted from https://github.com/facebookresearch/madgrad - https://arxiv.org/abs/2101.11075 +* `mars` MARS optimizer from https://github.com/AGI-Arena/MARS - https://arxiv.org/abs/2411.10438 * `nadam` an implementation of Adam w/ Nesterov momentum * `nadamw` an impementation of AdamW (Adam w/ decoupled weight-decay) w/ Nesterov momentum. A simplified impl based on https://github.com/mlcommons/algorithmic-efficiency * `novograd` by [Masashi Kimura](https://github.com/convergence-lab/novograd) - https://arxiv.org/abs/1905.11286 @@ -477,6 +493,7 @@ Included optimizers available via `timm.optim.create_optimizer_v2` factory metho * `sgdw` and implementation of SGD w/ decoupled weight-decay * `fused` optimizers by name with [NVIDIA Apex](https://github.com/NVIDIA/apex/tree/master/apex/optimizers) installed * `bnb` optimizers by name with [BitsAndBytes](https://github.com/TimDettmers/bitsandbytes) installed +* `cadamw`, `clion`, and more 'Cautious' optimizers from https://github.com/kyleliang919/C-Optim - https://arxiv.org/abs/2411.16085 * `adam`, `adamw`, `rmsprop`, `adadelta`, `adagrad`, and `sgd` pass through to `torch.optim` implementations ### Augmentations diff --git a/timm/optim/adafactor.py b/timm/optim/adafactor.py index e11b0a9f0..c426e30a1 100644 --- a/timm/optim/adafactor.py +++ b/timm/optim/adafactor.py @@ -83,7 +83,7 @@ def __setstate__(self, state): super().__setstate__(state) for group in self.param_groups: group.setdefault('caution', False) - group.setdefault('min_dim_size_to_factor', 32) + group.setdefault('min_dim_size_to_factor', 16) @staticmethod def _get_lr(param_group, param_state):