Timm models work with any size of input images? #1653

bekhzod-olimov · 2023-02-03T00:52:41Z

bekhzod-olimov
Feb 3, 2023

I noticed that a timm model gets any size of an input image without raising an error.

model = load_model("rexnet_150", 0)
model.eval()
a = torch.rand(1,3,230,300)
print(model(a).shape)  # torch.Size([1, 1920])

I guess input image size is changed when transforms are applied (perhaps with transforms.Resize).
Where this transformations happen and how to fix the input image size?

Answered by rwightman

Feb 3, 2023

@bekhzod-olimov most purely convolutional nets accept any image size, this is normal, it can actually be beneficial to input images that are 20-40% larger than the train size at inference time (termed train-test discrepancy), especially if training used extensive augmentation. So yeah, I'd be hesitant to try and 'constrain' or force a fixed image size, many people also fine-tune to much higher resolution.

transformer and hybrid cnn-transformer often have fixed resolutions (and will error out on different sized inputs). The resolutions that can only be set (sometimes changed from original weights) at model creation time as they have position embeddeings, blocking that is based on a pre-det…

View full answer

rwightman · 2023-02-03T01:00:18Z

rwightman
Feb 3, 2023
Maintainer

@bekhzod-olimov most purely convolutional nets accept any image size, this is normal, it can actually be beneficial to input images that are 20-40% larger than the train size at inference time (termed train-test discrepancy), especially if training used extensive augmentation. So yeah, I'd be hesitant to try and 'constrain' or force a fixed image size, many people also fine-tune to much higher resolution.

transformer and hybrid cnn-transformer often have fixed resolutions (and will error out on different sized inputs). The resolutions that can only be set (sometimes changed from original weights) at model creation time as they have position embeddeings, blocking that is based on a pre-determined image size and if they can be resized, this is done by interpolation into a larger (or smaller embedding)

In the pretrained_cfg you asked about, you can see most models with a limitation will have a fixed_input_size: True key in the config, or possibly a min_input_size=... key (cannot go below that res), or may error out if the res is not divisible by block size, etc.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timm models work with any size of input images? #1653

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Timm models work with any size of input images? #1653

bekhzod-olimov Feb 3, 2023

Replies: 1 comment

rwightman Feb 3, 2023 Maintainer

bekhzod-olimov
Feb 3, 2023

rwightman
Feb 3, 2023
Maintainer