Replies: 2 comments 4 replies
-
@seyeeet working on improving that with new/future models, and have a few gists under my name with some recent hparams for nfnets, mlp vision models... however, there won't ever be full coverage For the transformer models, you can find suitable hparams from related codebases that use some of timm's features ... the official code for deitt/cait/levit/resmlp and swin use timm components, augmentations, etc for the train pipeline so the hparams posted in their code and their papers can be used with their code and can be used here as well (with the exception of the distillation and repeat aug in deit). Also, the hparams in the recent 'How to train your ViT' would be easy to apply here just use 'adamw' when the paper mentions adam. |
Beta Was this translation helpful? Give feedback.
-
@AlexeyAB The 21k runs were handled by the Google researchers with their research TPU cloud. I was provisioned with some V100s to run smaller trials and I focused on exploring heavy augreg schemes to compare from-scratch vs transfer on the smaller datasets. As you note all of the Google hparams (and their train code) are step based, not epoch so you have to do the conversion. warmup steps is sometimes less than a full epoch, so yeah, 1 and 10 is the closest for the 1k example. I think your hparams look in the right ballpark. One thing that you'll run into (and I did too) re comparing timm results to theirs is my randaugment is a bit different, quite importantly so in the handling of some of the aug scales wrt to magnitude values. I think my
|
Beta Was this translation helpful? Give feedback.
-
I was wondering if there is anywhere that we can have the hyperparameters for each model (specially the transformers) that can produce results close to the papers? the models are very complicated and each hyperparameter can take a lot of time to figure it out. so if it is possible to share the used hyperparameters that would be great.
Beta Was this translation helpful? Give feedback.
All reactions