Swin S3-Small is different from original repo? #1906
-
In timm pytorch-image-models/timm/models/swin_transformer.py Lines 776 to 782 in f677190 S3-Small uses window size (14, 14, 14, 7). However, from what I see in https://github.com/microsoft/Cream/blob/main/AutoFormerV2/configs/S3-S.yaml, the original repo uses (14, 14, 14, 14). Is it a mistake? I also see that timm has pre-trained weights for S3-Small. Was it trained by @rwightman or converted from microsoft/Cream repo? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
@gau-nernst I did take the checkpoints from the Cream repo, yes. It was a while ago now, but I think the actual weights didn't match their config? I did what was needed model config wise to load the weights. at 224x224 a window size of 14 doesn't make any sense in any case, the feature map is 7x7 for the final stage, you'd need an input size of 448x448 to support 14,14,14,14. The resulting model in timm matches the reported S3-Small accuracy (83.758 in timm) so I think their configs are just out of sync with weights... |
Beta Was this translation helpful? Give feedback.
@gau-nernst I did take the checkpoints from the Cream repo, yes. It was a while ago now, but I think the actual weights didn't match their config? I did what was needed model config wise to load the weights.
at 224x224 a window size of 14 doesn't make any sense in any case, the feature map is 7x7 for the final stage, you'd need an input size of 448x448 to support 14,14,14,14. The resulting model in timm matches the reported S3-Small accuracy (83.758 in timm) so I think their configs are just out of sync with weights...