[FEATURE] Group Normalization for existing models #610

sowmen · 2021-03-24T07:48:34Z

sowmen
Mar 24, 2021

Group Normalization seems to be effective for low batch training on less powerful h/w. Any plans to replace BN or add options for GN to existing models like EfficientNet? If not can you suggest a way I can implement GN to existing EfficientNet models? I was hoping to try the BiT (Big Transfer) technique.

rwightman · 2021-03-24T17:29:53Z

rwightman
Mar 24, 2021
Maintainer

@sowmen I've added the ability to customize the norm layer and activation for a lot of the models in this collection, it shouldn't be too challenging. Actually adding such model defs and training them from scratch isn't currently in my plans.

Something (mostly), like below would work for EfficientNet

import torch.nn as nn
import torch.nn.functional as F
from timm.models.layers import GroupNorm
from timm.models.efficientnet import _gen_efficientnet
import timm.models
import functools


class GroupNorm(nn.GroupNorm):
    def __init__(self, num_channels, num_groups, eps=1e-5, affine=True):
        # NOTE num_channels is swapped to first arg for consistency in swapping norm layers with BN
        super().__init__(num_groups, num_channels, eps=eps, affine=affine)

    def forward(self, x):
        return F.group_norm(x, self.num_groups, self.weight, self.bias, self.eps)


@timm.models.register_model
def efficientnet_b0_gn(pretrained=False, **kwargs):
    """ EfficientNet-B0 """
    # NOTE for train, drop_rate should be 0.2, drop_path_rate should be 0.2
    gn = functools.partial(GroupNorm, num_groups=8)
    model = _gen_efficientnet(
        'efficientnet_b0', channel_multiplier=1.0, depth_multiplier=1.0,
        norm_layer=gn,  # change normalization layer
        pretrained=pretrained,
        external_default_cfg=dict(url=''),  # bit of a hack, make sure original pretrained weights don't get used
        **kwargs)
    return model

0 replies

mobassir94 · 2021-03-28T07:13:39Z

mobassir94
Mar 28, 2021

i would like to have this feature as well,,some models with frn layers with tlu could be great for people like us having low gpu vram,,replacing bn with frn+tlu for all different models are not straightforward task,,some of these pretrained models could help people like us a lot but no one working on it for pretraining models :(

0 replies

rwightman · 2021-03-28T18:43:50Z

rwightman
Mar 28, 2021
Maintainer

@mobassir94 it's a huge amount of time & work & hardware resources to define and train a series of new models w/ GN so it's unlikely to happen unless someone else takes it on. One of the likely reasons why this isn't common is that GN models are no faster, and actually use more GPU ram than a BN network. You can train w/ stability at smaller batch sizes, but they force you to use smaller batch sizes even sooner than BN based net.

6 replies

rwightman Nov 4, 2021
Maintainer

@hiyyg I'd probably just train a GN model myself as it's a data point I've wanted to explore with my current training recipes. In the meantime, the bitm resnetv2 models here are GN based with stdconv, the stdconv is much lower overhead than GN.

hiyyg Nov 5, 2021

Sure. Then I will just use them without timm.

rwightman Nov 5, 2021
Maintainer

Sure, I have some queued up to train now but it'll take a while, probably a few weeks. The bitm models are likely better than the detectron ones.

hiyyg Nov 7, 2021

Not sure whether stdconv is really better, on my task, gn+ws is much worse than gn only.

rwightman Nov 8, 2021
Maintainer

@hiyyg hmm, could also be the pre-activation, the v2 resnets outputs are pre-activation. Theoretically, the ws could be disabled and the values folded into the conv weights as a one time modification, then it'd just be a v2 + gn model for fine-tuning on other tasks, however that's a bit involved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Group Normalization for existing models #610

{{title}}

Replies: 3 comments 6 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[FEATURE] Group Normalization for existing models #610

sowmen Mar 24, 2021

Replies: 3 comments · 6 replies

rwightman Mar 24, 2021 Maintainer

mobassir94 Mar 28, 2021

rwightman Mar 28, 2021 Maintainer

rwightman Nov 4, 2021 Maintainer

hiyyg Nov 5, 2021

rwightman Nov 5, 2021 Maintainer

hiyyg Nov 7, 2021

rwightman Nov 8, 2021 Maintainer

sowmen
Mar 24, 2021

Replies: 3 comments 6 replies

rwightman
Mar 24, 2021
Maintainer

mobassir94
Mar 28, 2021

rwightman
Mar 28, 2021
Maintainer

rwightman Nov 4, 2021
Maintainer

rwightman Nov 5, 2021
Maintainer

rwightman Nov 8, 2021
Maintainer