PyTorch official implementation of (�Jungmin Ha, Euihyun Yoon, Sungsik Kim, Jinkyu Kim, and Jaekoo Lee. "Leveraging Inductive Bias in ViT for Medical Image Diagnosis" BMVC, 2024).
An overview of our proposed model. Built upon Vision Transformer, we use the following three building blocks: (1) Stem Block, (2) SWA Block for 1st and 2nd stages, and (3) DA Block for 3rd and 4th stages. In image classification, the output feature map undergoes Global Average Pooling(GAP) and MLP processing. For segmentation, fused feature maps with Fused Feature Pyramid Network(FPN) from Stages are utilized. (b, c, d) Detailed Explanation of Local Attenton, Shifted-Window Attention and Deformable Attention
Comparison of classification and segmentation performance on various datasets. Note that scores in parenthesis represent results with the black-hat transform as preprocess- ing. Bold text indicates the best performance, while underlined text indicates the second-best performance among all models.
- PyTorch (> 1.2.0)
- torchvision
- numpy