Skip to content

Commit

Permalink
Update README, missed small discrep in adafactor min dim update
Browse files Browse the repository at this point in the history
  • Loading branch information
rwightman committed Nov 29, 2024
1 parent e3e434b commit 886eb77
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 3 deletions.
21 changes: 19 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,20 @@

## What's New

## Nov 28, 2024
* More optimizers
* Add MARS optimizer (https://arxiv.org/abs/2411.10438, https://github.com/AGI-Arena/MARS)
* Add LaProp optimizer (https://arxiv.org/abs/2002.04839, https://github.com/Z-T-WANG/LaProp-Optimizer)
* Add masking from 'Cautious Optimizers' (https://arxiv.org/abs/2411.16085, https://github.com/kyleliang919/C-Optim) to Adafactor, Adafactor Big Vision, AdamW (legacy), Adopt, Lamb, LaProp, Lion, NadamW, RMSPropTF, SGDW
* Cleanup some docstrings and type annotations re optimizers and factory
* Add MobileNet-V4 Conv Medium models pretrained on in12k and fine-tuned in1k @ 384x384
* https://huggingface.co/timm/mobilenetv4_conv_medium.e250_r384_in12k_ft_in1k
* https://huggingface.co/timm/mobilenetv4_conv_medium.e250_r384_in12k
* https://huggingface.co/timm/mobilenetv4_conv_medium.e180_ad_r384_in12k
* https://huggingface.co/timm/mobilenetv4_conv_medium.e180_r384_in12k
* Add small cs3darknet, quite good for the speed
* https://huggingface.co/timm/cs3darknet_focus_s.ra4_e3600_r256_in1k

## Nov 12, 2024
* Optimizer factory refactor
* New factory works by registering optimizers using an OptimInfo dataclass w/ some key traits
Expand Down Expand Up @@ -463,12 +477,14 @@ Included optimizers available via `timm.optim.create_optimizer_v2` factory metho
* `adahessian` by [David Samuel](https://github.com/davda54/ada-hessian) - https://arxiv.org/abs/2006.00719
* `adamp` and `sgdp` by [Naver ClovAI](https://github.com/clovaai) - https://arxiv.org/abs/2006.08217
* `adan` an implementation of Adan adapted from https://github.com/sail-sg/Adan - https://arxiv.org/abs/2208.06677
* `adopt` - adapted from https://github.com/iShohei220/adopt - https://arxiv.org/abs/2411.02853
* `adopt` ADOPT adapted from https://github.com/iShohei220/adopt - https://arxiv.org/abs/2411.02853
* `lamb` an implementation of Lamb and LambC (w/ trust-clipping) cleaned up and modified to support use with XLA - https://arxiv.org/abs/1904.00962
* `laprop` optimizer from https://github.com/Z-T-WANG/LaProp-Optimizer - https://arxiv.org/abs/2002.04839
* `lars` an implementation of LARS and LARC (w/ trust-clipping) - https://arxiv.org/abs/1708.03888
* `lion` and implementation of Lion adapted from https://github.com/google/automl/tree/master/lion - https://arxiv.org/abs/2302.06675
* `lookahead` adapted from impl by [Liam](https://github.com/alphadl/lookahead.pytorch) - https://arxiv.org/abs/1907.08610
* `madgrad` - and implementation of MADGRAD adapted from https://github.com/facebookresearch/madgrad - https://arxiv.org/abs/2101.11075
* `madgrad` an implementation of MADGRAD adapted from https://github.com/facebookresearch/madgrad - https://arxiv.org/abs/2101.11075
* `mars` MARS optimizer from https://github.com/AGI-Arena/MARS - https://arxiv.org/abs/2411.10438
* `nadam` an implementation of Adam w/ Nesterov momentum
* `nadamw` an impementation of AdamW (Adam w/ decoupled weight-decay) w/ Nesterov momentum. A simplified impl based on https://github.com/mlcommons/algorithmic-efficiency
* `novograd` by [Masashi Kimura](https://github.com/convergence-lab/novograd) - https://arxiv.org/abs/1905.11286
Expand All @@ -477,6 +493,7 @@ Included optimizers available via `timm.optim.create_optimizer_v2` factory metho
* `sgdw` and implementation of SGD w/ decoupled weight-decay
* `fused<name>` optimizers by name with [NVIDIA Apex](https://github.com/NVIDIA/apex/tree/master/apex/optimizers) installed
* `bnb<name>` optimizers by name with [BitsAndBytes](https://github.com/TimDettmers/bitsandbytes) installed
* `cadamw`, `clion`, and more 'Cautious' optimizers from https://github.com/kyleliang919/C-Optim - https://arxiv.org/abs/2411.16085
* `adam`, `adamw`, `rmsprop`, `adadelta`, `adagrad`, and `sgd` pass through to `torch.optim` implementations

### Augmentations
Expand Down
2 changes: 1 addition & 1 deletion timm/optim/adafactor.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ def __setstate__(self, state):
super().__setstate__(state)
for group in self.param_groups:
group.setdefault('caution', False)
group.setdefault('min_dim_size_to_factor', 32)
group.setdefault('min_dim_size_to_factor', 16)

@staticmethod
def _get_lr(param_group, param_state):
Expand Down

0 comments on commit 886eb77

Please sign in to comment.