EfficientNetV2-rw-S speed test from the "ResNet strikes back" article #1089

mrT23 · 2022-01-14T15:44:56Z

mrT23
Jan 14, 2022

Hi
A small issue i want to raise:

According to the paper "ResNet strikes back", EfficientNetV2-rw-S has an inference speed of 823 im/sec for achieving a score of 80.6 (Table 4).
For comparison, ResNet50 achieves 80.4% with inference speed of 2536 im/sec. So ResNet50 is ~x3 times faster, and achieves simillar accuracy.

The speed of EfficientNetV2-rw-S in the article seems low.
In my speed measurements, ResNet50 achieves 2819 im/sec (pretty similar to yours), but EfficientNetV2-rw-S achieves 2323 im/sec.

might worth re-checking and validating the speed measurement of EfficientNetV2-rw-S.

p.s.
my motivation for this whole testing was the tweet:
https://twitter.com/tanmingxing/status/1481362887272636417
it made me wonder if indeed EfficientNetV2 is a big improvement, and gives the best speed-acc tradeoff today. After digging up a bit i found out that EfficientNetV2 still couples architecture with resolution, so the comparisons in their article are not fully fair. with the same resolution and training, they don't seem to outperform ResNet50

rwightman · 2022-01-14T17:45:21Z

rwightman
Jan 14, 2022
Maintainer

@mrT23 the inference numbers are at the default inference resolution which is 384 x 384 for the rw-s, and that is 823 in NCHW and ~1000 in NHWC. It's closer to 3000 at 224 NHWC and 2220 at NCHW. But resolution scaling is part of the EfficientNet model scaling scheme and part of the top-1 scores.

EDIT: It should be noted that the score in the RBS table is applying our ResNet50 optimized recipe to other nets, training that net with the R50 recipe was awful and not at all recommended. The EfficientNetV2 small in the paper was 83.9 top-1, and my variant and train hparams were 83.8 which is not achievable with a standard ResNet until you approach 200/200-D ResNets with a bit of resolution scaling (maybe 256-288).

1 reply

rwightman Jan 14, 2022
Maintainer

To Mingxing Tan's comment, on accuracy the ConvNeXt Base is comparable to the Small EfficientNetV2. Running some throughput tests (with a few tweaks I made to ConvNeXt) and EffV2 is faster. ConvNeXt is just over 800img/sec inf in both memory formats, and as per above, EffV2 can hit 1000img/sec with NHWC.

rwightman · 2022-01-14T18:46:12Z

rwightman
Jan 14, 2022
Maintainer

Interestingly, a few other nets I've worked with recently from same family that can also hit 83.8 top-1 (almost exactly) and compare favourably to either of these in inference throughput. All variants of RegNetZ -- essentially an efficientnet w/ grouped convs -- my D8 (regnetz_d8) and D32 (regnetz_d32) and a more recent attempt to match the paper 4.0G model (regnetz_040 coming soon) are between 1200-1500 img/sec (NHWC) inf at resolutions 256-288 needed to hit that top-1. Train throughput is higher too but not as high as it could be due to some long standing cuDNN kernel issues (particularly impacting backward pass) re grouped convs.

I've been doing quite a few RegNet-Y/Z (and my own 'V' pre-act variant of Y) recently, all on TPU v3 instances with my XLA branch because they train well on TPUs vs GPU where the train throughputs aren't ideal due to the mentioned issue.

4 replies

mrT23 Jan 14, 2022
Author

ok, got it.

i am strongly against this coupling of architecture and resolution. In practice, to achieve the best speed-accuracy tradeoff, the optimal resolution on ImageNet is indeed above 224 (something in the region of 288-360, I estimate).

However, if we want to compare architecture quality, it is vital to compare with the same resolution. Using base resolution different than the common 224 can, and will, lead to false comparisons. it's just an unfair trick.

rwightman Jan 14, 2022
Maintainer

I'm not really certain how it's 'unfair' if you are comparing runtime traits, accuracy numbers, flops, etc AND you are specifiing the resolution as well. The resolution at which a given architecture does best is in fact related to that architecture, start decreasing the capacity of the architecture and it can no longer benefit from higher resolution (or at least the returns diminish rapidly).

Continuing up the ResNet model stack into larger models w/o increasing the resolution is quite pointless in my opinion. I feel all models should be scaled with resolution. So I have no issues comparing them at different resolutions. In fact, if you compare the models on 'activation count' metric, w/ resolution scaling the relationship to the accuracy appears quite consistent with increasing/decreasing activation count by other model traits like width/depth (within reason)

mrT23 Jan 14, 2022
Author

If our goal is to estimate which model is better for transfer learning, scaling models may confuse us - comparing model A with resolution 384 to model B with resolution 224 is comparing oranges to apples in my opinion. it will be hard to estimate from this comparison which model will perform better on the downstream.

one of the things i like about the "ResNet Strikes Back" paper is that it provides a solid unified baseline for training architectures. This brings us closer to a fair agreed way of properly measuring and comparing architectures.

rwightman Jan 14, 2022
Maintainer

There more I've been looking at the different architectures lately, activation counts has stood out as one of the most consistent predictors of 'model capacity', at least in terms of classification performance here. I'd be surprised if that isn't exactly the same for transfer learning. I hope to further look at this in CLIP pretraining trials, zero-shot, etc I'm working on but that's TBD.

Aside from specifics of the architecture components such as self-attention, convs, MLP, etc activation count can be varied by varying the depth, width, and resolution (seq len) ... so again, I do not feel it is unfair to include resolution or that comparisons make sense without it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EfficientNetV2-rw-S speed test from the "ResNet strikes back" article #1089

{{title}}

Replies: 2 comments 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

EfficientNetV2-rw-S speed test from the "ResNet strikes back" article #1089

mrT23 Jan 14, 2022

Replies: 2 comments · 5 replies

rwightman Jan 14, 2022 Maintainer

rwightman Jan 14, 2022 Maintainer

rwightman Jan 14, 2022 Maintainer

mrT23 Jan 14, 2022 Author

rwightman Jan 14, 2022 Maintainer

mrT23 Jan 14, 2022 Author

rwightman Jan 14, 2022 Maintainer

mrT23
Jan 14, 2022

Replies: 2 comments 5 replies

rwightman
Jan 14, 2022
Maintainer

rwightman Jan 14, 2022
Maintainer

rwightman
Jan 14, 2022
Maintainer

mrT23 Jan 14, 2022
Author

rwightman Jan 14, 2022
Maintainer

mrT23 Jan 14, 2022
Author

rwightman Jan 14, 2022
Maintainer