- Image classification task results
- ResNetCifar training from scratch on CIFAR100
- ResNet training from scratch on ImageNet1K(ILSVRC2012)
- DarkNet training from scratch on ImageNet1K(ILSVRC2012)
- RepVGG training from scratch on ImageNet1K(ILSVRC2012)
- RegNet training from scratch on ImageNet1K(ILSVRC2012)
- U2NetBackbone training from scratch on ImageNet1K(ILSVRC2012)
- ResNet finetune from ImageNet21k pretrain weight on ImageNet1K(ILSVRC2012)
- ResNet finetune from DINO pretrain weight on ImageNet1K(ILSVRC2012)
- ViT finetune from self-trained MAE pretrain weight(400epoch) on ImageNet1K(ILSVRC2012)
- ViT finetune from offical MAE pretrain weight(800 epoch) on ImageNet1K(ILSVRC2012)
- ResNet train from ImageNet1K pretrain weight on ImageNet21K(Winter 2021 release)
- Object detection task results
- All detection models training from scratch on COCO2017
- All detection models finetune from objects365 pretrain weight on COCO2017
- All detection models train from COCO2017 pretrain weight on Objects365(v2,2020)
- All detection models training from scratch on VOC2007 and VOC2012
- All detection models finetune from objects365 pretrain weight on VOC2007 and VOC2012
- Semantic Segmentation task results
- Knowledge distillation task results
- Contrastive learning task results
- Masked image modeling task results
- Diffusion model task results
ResNet
Paper:https://arxiv.org/abs/1512.03385
DarkNet
Paper:https://arxiv.org/abs/1804.02767?e05802c1_page=1
RepVGG
Paper:https://arxiv.org/abs/2101.03697
RegNet
Paper:https://arxiv.org/abs/2003.13678
ViT
Paper:https://arxiv.org/abs/2010.11929
ResNetCifar is different from ResNet in the first few layers.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
ResNet18Cifar | 557.935M | 11.220M | 32x32 | 1 RTX A5000 | 128 | 200 | 77.110 |
ResNet34Cifar | 1.164G | 21.328M | 32x32 | 1 RTX A5000 | 128 | 200 | 78.140 |
ResNet50Cifar | 1.312G | 23.705M | 32x32 | 1 RTX A5000 | 128 | 200 | 75.610 |
ResNet101Cifar | 2.531G | 42.697M | 32x32 | 1 RTX A5000 | 128 | 200 | 76.970 |
ResNet152Cifar | 3.751G | 58.341M | 32x32 | 1 RTX A5000 | 128 | 200 | 77.710 |
You can find more model training details in classification_training/cifar100/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
ResNet18 | 1.819G | 11.690M | 224x224 | 2 RTX A5000 | 256 | 100 | 70.512 |
ResNet34 | 3.671G | 21.798M | 224x224 | 2 RTX A5000 | 256 | 100 | 73.680 |
ResNet50 | 4.112G | 25.557M | 224x224 | 2 RTX A5000 | 256 | 100 | 76.300 |
ResNet101 | 7.834G | 44.549M | 224x224 | 2 RTX A5000 | 256 | 100 | 77.380 |
ResNet152 | 11.559G | 60.193M | 224x224 | 2 RTX A5000 | 256 | 100 | 77.542 |
You can find more model training details in classification_training/imagenet/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
DarkNetTiny | 412.537M | 2.087M | 256x256 | 2 RTX A5000 | 256 | 100 | 57.786 |
DarkNet19 | 3.663G | 20.842M | 256x256 | 2 RTX A5000 | 256 | 100 | 74.248 |
DarkNet53 | 9.322G | 41.610M | 256x256 | 2 RTX A5000 | 256 | 100 | 76.352 |
You can find more model training details in classification_training/imagenet/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
RepVGG_A0_deploy | 1.362G | 8.309M | 224x224 | 2 RTX A5000 | 256 | 120 | 72.010 |
RepVGG_A1_deploy | 2.364G | 12.790M | 224x224 | 2 RTX A5000 | 256 | 120 | 74.032 |
RepVGG_A2_deploy | 5.117G | 25.500M | 224x224 | 2 RTX A5000 | 256 | 120 | 76.078 |
RepVGG_B0_deploy | 3.058G | 14.339M | 224x224 | 2 RTX A5000 | 256 | 120 | 74.880 |
RepVGG_B1_deploy | 11.816G | 51.829M | 224x224 | 2 RTX A5000 | 256 | 120 | 77.790 |
RepVGG_B2_deploy | 18.377G | 80.315M | 224x224 | 2 RTX A5000 | 256 | 120 | 78.120 |
You can find more model training details in classification_training/imagenet/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
RegNetX_400MF | 410.266M | 5.158M | 224x224 | 2 RTX A5000 | 4096 | 300 | 69.466 |
RegNetX_600MF | 616.813M | 6.196M | 224x224 | 2 RTX A5000 | 4096 | 300 | 71.754 |
RegNetX_800MF | 820.324M | 7.260M | 224x224 | 2 RTX A5000 | 4096 | 300 | 73.148 |
RegNetX_1_6GF | 1.635G | 9.190M | 224x224 | 2 RTX A5000 | 4096 | 300 | 76.142 |
RegNetX_3_2GF | 3.222G | 15.297M | 224x224 | 2 RTX A5000 | 4096 | 300 | 78.244 |
RegNetX_4_0GF | 4.013G | 22.118M | 224x224 | 2 RTX A5000 | 4096 | 300 | 78.916 |
You can find more model training details in classification_training/imagenet/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
U2NetBackbone | 13.097G | 26.181M | 224x224 | 2 RTX A5000 | 256 | 100 | 76.038 |
You can find more model training details in classification_training/imagenet/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
ResNet18 | 1.819G | 11.690M | 224x224 | 2 RTX A5000 | 4096 | 300 | 71.580 |
ResNet34 | 3.671G | 21.798M | 224x224 | 2 RTX A5000 | 4096 | 300 | 76.316 |
ResNet50 | 4.112G | 25.557M | 224x224 | 2 RTX A5000 | 4096 | 300 | 79.484 |
ResNet101 | 7.834G | 44.549M | 224x224 | 2 RTX A5000 | 4096 | 300 | 80.940 |
ResNet152 | 11.559G | 60.193M | 224x224 | 2 RTX A5000 | 4096 | 300 | 81.236 |
You can find more model training details in classification_training/imagenet/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
ResNet50 | 4.112G | 25.557M | 224x224 | 2 RTX A5000 | 256 | 100 | 77.114 |
ResNet50 | 4.112G | 25.557M | 224x224 | 2 RTX A5000 | 4096 | 300 | 79.418 |
You can find more model training details in classification_training/imagenet/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
ViT-Small-Patch16 | 4.241G | 21.955M | 224x224 | 2 RTX A5000 | 4096 | 100 | 79.006 |
ViT-Base-Patch16 | 16.849G | 86.377M | 224x224 | 2 RTX A5000 | 4096 | 100 | 83.204 |
ViT-Large-Patch16 | 59.647G | 304.024M | 224x224 | 2 RTX A5000 | 4096 | 100 | 85.020 |
You can find more model training details in classification_training/imagenet/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
ViT-Base-Patch16 | 16.849G | 86.377M | 224x224 | 2 RTX A5000 | 4096 | 100 | 83.290 |
ViT-Large-Patch16 | 59.647G | 304.024M | 224x224 | 2 RTX A5000 | 4096 | 100 | 85.876 |
You can find more model training details in classification_training/imagenet/.
Network | macs | params | input size | gpu num | batch | epochs | Semantic Softmax Acc |
---|---|---|---|---|---|---|---|
ResNet18 | 1.819G | 11.690M | 224x224 | 2 RTX A5000 | 4096 | 80 | 68.639 |
ResNet34 | 3.671G | 21.798M | 224x224 | 2 RTX A5000 | 4096 | 80 | 71.873 |
ResNet50 | 4.112G | 25.557M | 224x224 | 2 RTX A5000 | 4096 | 80 | 74.664 |
ResNet101 | 7.834G | 44.549M | 224x224 | 2 RTX A5000 | 4096 | 80 | 76.136 |
ResNet152 | 11.559G | 60.193M | 224x224 | 2 RTX A5000 | 4096 | 80 | 75.731 |
You can find more model training details in classification_training/imagenet21k/.
RetinaNet
Paper:https://arxiv.org/abs/1708.02002
FCOS
Paper:https://arxiv.org/abs/1904.01355
CenterNet
Paper:https://arxiv.org/abs/1904.07850
TTFNet
Paper:https://arxiv.org/abs/1909.00700
ResNet50 backbone using DINO pretrained weight on ImageNet1K.
Trained on COCO2017 train dataset, tested on COCO2017 val dataset.
mAP is IoU=0.5:0.95,area=all,maxDets=100,mAP(COCOeval,stats[0]).
Network | resize-style | input size | macs | params | gpu num | batch | epochs | mAP |
---|---|---|---|---|---|---|---|---|
ResNet50-RetinaNet | YoloStyle-640 | 640x640 | 95.558G | 37.969M | 2 RTX A5000 | 32 | 13 | 34.459 |
ResNet50-RetinaNet | YoloStyle-800 | 800x800 | 149.522G | 37.969M | 2 RTX A5000 | 32 | 13 | 36.023 |
ResNet50-RetinaNet | RetinaStyle-800 | 800x1333 | 250.069G | 37.969M | 2 RTX A5000 | 8 | 13 | 35.434 |
ResNet50-FCOS | YoloStyle-640 | 640x640 | 81.943G | 32.291M | 2 RTX A5000 | 32 | 13 | 37.176 |
ResNet50-FCOS | YoloStyle-800 | 800x800 | 128.160G | 32.291M | 2 RTX A5000 | 32 | 13 | 38.745 |
ResNet50-FCOS | RetinaStyle-800 | 800x1333 | 214.406G | 32.291M | 2 RTX A5000 | 8 | 13 | 39.649 |
ResNet18DCN-CenterNet | YoloStyle-512 | 512x512 | 14.854G | 12.889M | 2 RTX A5000 | 64 | 140 | 26.209 |
ResNet18DCN-TTFNet-3x | YoloStyle-512 | 512x512 | 16.063G | 13.737M | 2 RTX A5000 | 64 | 39 | 27.054 |
You can find more model training details in detection_training/coco/.
Trained on COCO2017 train dataset, tested on COCO2017 val dataset.
mAP is IoU=0.5:0.95,area=all,maxDets=100,mAP(COCOeval,stats[0]).
Network | resize-style | input size | macs | params | gpu num | batch | epochs | mAP |
---|---|---|---|---|---|---|---|---|
ResNet50-RetinaNet | YoloStyle-640 | 640x640 | 95.558G | 37.969M | 2 RTX A5000 | 32 | 13 | 38.930 |
ResNet50-RetinaNet | YoloStyle-800 | 800x800 | 149.522G | 37.969M | 2 RTX A5000 | 32 | 13 | 40.483 |
ResNet50-RetinaNet | RetinaStyle-800 | 800x1333 | 250.069G | 37.969M | 2 RTX A5000 | 8 | 13 | 40.424 |
ResNet50-FCOS | YoloStyle-640 | 640x640 | 81.943G | 32.291M | 2 RTX A5000 | 32 | 13 | 42.871 |
ResNet50-FCOS | YoloStyle-800 | 800x800 | 128.160G | 32.291M | 2 RTX A5000 | 32 | 13 | 44.526 |
ResNet50-FCOS | RetinaStyle-800 | 800x1333 | 214.406G | 32.291M | 2 RTX A5000 | 8 | 13 | 42.848 |
You can find more model training details in detection_training/coco/.
Trained on objects365 train dataset, tested on objects365 val dataset.
mAP is IoU=0.5:0.95,area=all,maxDets=100,mAP(COCOeval,stats[0]).
Network | resize-style | input size | macs | params | gpu num | batch | epochs | mAP |
---|---|---|---|---|---|---|---|---|
ResNet50-RetinaNet | YoloStyle-800 | 800x800 | 149.522G | 37.969M | 8 RTX A5000 | 32 | 13 | 16.360 |
ResNet50-FCOS | RetinaStyle-800 | 800x1333 | 214.406G | 32.291M | 8 RTX A5000 | 32 | 13 | 17.068 |
Trained on VOC2007 trainval dataset + VOC2012 trainval dataset, tested on VOC2007 test dataset.
mAP is IoU=0.50,area=all,maxDets=100,mAP.
Network | resize-style | input size | macs | params | gpu num | batch | epochs | mAP |
---|---|---|---|---|---|---|---|---|
ResNet50-RetinaNet | YoloStyle-640 | 640x640 | 84.947G | 36.724M | 2 RTX A5000 | 32 | 13 | 81.948 |
ResNet50-FCOS | YoloStyle-640 | 640x640 | 80.764G | 32.153M | 2 RTX A5000 | 32 | 13 | 81.624 |
You can find more model training details in detection_training/voc/.
Trained on VOC2007 trainval dataset + VOC2012 trainval dataset, tested on VOC2007 test dataset.
mAP is IoU=0.50,area=all,maxDets=100,mAP.
Network | resize-style | input size | macs | params | gpu num | batch | epochs | mAP |
---|---|---|---|---|---|---|---|---|
ResNet50-RetinaNet | YoloStyle-640 | 640x640 | 84.947G | 36.724M | 2 RTX A5000 | 32 | 13 | 90.220 |
ResNet50-FCOS | YoloStyle-640 | 640x640 | 80.764G | 32.153M | 2 RTX A5000 | 32 | 13 | 90.371 |
You can find more model training details in detection_training/voc/.
DeepLabv3+
Paper:https://arxiv.org/abs/1802.02611
U2Net
Paper:https://arxiv.org/abs/2005.09007
Network | input size | macs | params | gpu num | batch | epochs | miou |
---|---|---|---|---|---|---|---|
ResNet50-DeepLabv3+ | 512x512 | 25.548G | 26.738M | 2 RTX A5000 | 8 | 128 | 34.659 |
U2Net | 512x512 | 219.012G | 46.191M | 2 RTX A5000 | 8 | 128 | 39.046 |
You can find more model training details in semantic_segmentation_training/ade20k/.
Network | input size | macs | params | gpu num | batch | epochs | miou |
---|---|---|---|---|---|---|---|
ResNet50-DeepLabv3+ | 512x512 | 25.548G | 26.738M | 2 RTX A5000 | 32 | 64 | 64.176 |
U2Net | 512x512 | 219.012G | 46.191M | 4 RTX A5000 | 32 | 64 | 66.529 |
You can find more model training details in semantic_segmentation_training/coco/.
KD loss
Paper:https://arxiv.org/abs/1503.02531
DML loss
Paper:https://arxiv.org/abs/1706.00384
Teacher Network | Student Network | method | Freeze Teacher | input size | gpu num | batch | epochs | Teacher Top-1 | Student Top-1 |
---|---|---|---|---|---|---|---|---|---|
ResNet152 | ResNet50 | CE+KD | True | 224x224 | 2 RTX A5000 | 256 | 100 | / | 77.352 |
ResNet152 | ResNet50 | CE+DML | False | 224x224 | 2 RTX A5000 | 256 | 100 | 79.274 | 78.122 |
ResNet152 | ResNet50 | CE+KD+Vit Aug | True | 224x224 | 2 RTX A5000 | 4096 | 300 | / | 80.168 |
ResNet152 | ResNet50 | CE+DML+Vit Aug | False | 224x224 | 2 RTX A5000 | 4096 | 300 | 81.508 | 79.810 |
You can find more model training details in distillation_training/imagenet/.
DINO:Emerging Properties in Self-Supervised Vision Transformers
Paper:https://arxiv.org/abs/2104.14294
Network | input size | gpu num | batch | epochs | Loss |
---|---|---|---|---|---|
ResNet50-DINO | 224x224 | 4 RTX A5000 | 256 | 400 | 1.997 |
You can find more model training details in contrastive_learning_training/imagenet/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
ResNet50 | 4.112G | 25.557M | 224x224 | 2 RTX A5000 | 256 | 100 | 77.114 |
ResNet50 | 4.112G | 25.557M | 224x224 | 2 RTX A5000 | 4096 | 300 | 79.418 |
You can find more model training details in classification_training/imagenet/.
MAE:Masked Autoencoders Are Scalable Vision Learners
Paper:https://arxiv.org/abs/2111.06377
Network | input size | gpu num | batch | epochs | Loss |
---|---|---|---|---|---|
ViT-Small-Patch16 | 224x224 | 2 RTX A5000 | 256 | 400 | 0.414 |
ViT-Base-Patch16 | 224x224 | 2 RTX A5000 | 256 | 400 | 0.388 |
ViT-Large-Patch16 | 224x224 | 2 RTX A5000 | 256 | 400 | 0.378 |
You can find more model training details in masked_image_modeling_training/imagenet/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
ViT-Small-Patch16 | 4.241G | 21.955M | 224x224 | 2 RTX A5000 | 4096 | 100 | 79.006 |
ViT-Base-Patch16 | 16.849G | 86.377M | 224x224 | 2 RTX A5000 | 4096 | 100 | 83.204 |
ViT-Large-Patch16 | 59.647G | 304.024M | 224x224 | 2 RTX A5000 | 4096 | 100 | 85.020 |
You can find more model training details in classification_training/imagenet/.
Network | macs | params | input size | gpu num | batch | epochs | Top-1 |
---|---|---|---|---|---|---|---|
ViT-Base-Patch16 | 16.849G | 86.377M | 224x224 | 2 RTX A5000 | 4096 | 100 | 83.290 |
ViT-Large-Patch16 | 59.647G | 304.024M | 224x224 | 2 RTX A5000 | 4096 | 100 | 85.876 |
You can find more model training details in classification_training/imagenet/.
Denoising Diffusion Probabilistic Models
Paper:https://arxiv.org/abs/2006.11239
Denoising Diffusion Implicit Models
Paper:https://arxiv.org/abs/2010.02502
High-Resolution Image Synthesis with Latent Diffusion Models
Paper:https://arxiv.org/abs/2112.10752
Trained diffusion unet on CIFAR10 dataset(DDPM method).Test image num=50000.
sampling method | input size | steps | condition label(train/test) | FID | IS score(mean/std) |
---|---|---|---|---|---|
DDPM | 32x32 | 1000 | False/False | 5.394 | 8.684/0.169 |
DDIM | 32x32 | 50 | False/False | 7.644 | 8.642/0.129 |
PLMS | 32x32 | 20 | False/False | 7.027 | 8.834/0.200 |
DDPM | 32x32 | 1000 | True/True | 3.949 | 8.985/0.139 |
You can find more model training details in diffusion_model_training/cifar10/.
Trained diffusion unet on CIFAR100 dataset(DDPM method).Test image num=50000.
sampling method | input size | steps | condition label(train/test) | FID | IS score(mean/std) |
---|---|---|---|---|---|
DDPM | 32x32 | 1000 | False/False | 9.620 | 9.399/0.138 |
DDIM | 32x32 | 50 | False/False | 13.250 | 8.946/0.150 |
PLMS | 32x32 | 20 | False/False | 11.854 | 9.391/0.202 |
DDPM | 32x32 | 1000 | True/True | 5.209 | 10.880/0.180 |
You can find more model training details in diffusion_model_training/cifar100/.
Trained diffusion unet on CelebA-HQ dataset(DDPM method).Test image num=28000.
sampling method | input size | steps | condition label(train/test) | FID | IS score(mean/std) |
---|---|---|---|---|---|
DDPM | 64x64 | 1000 | False/False | 6.491 | 2.577/0.035 |
DDIM | 64x64 | 50 | False/False | 15.195 | 2.625/0.028 |
PLMS | 64x64 | 20 | False/False | 18.061 | 2.701/0.040 |
You can find more model training details in diffusion_model_training/celebahq/.
Trained diffusion unet on FFHQ dataset(DDPM method).Test image num=60000.
sampling method | input size | steps | condition label(train/test) | FID | IS score(mean/std) |
---|---|---|---|---|---|
DDPM | 64x64 | 1000 | False/False | 6.671 | 3.399/0.055 |
DDIM | 64x64 | 50 | False/False | 10.479 | 3.431/0.044 |
PLMS | 64x64 | 20 | False/False | 12.387 | 3.462/0.034 |
You can find more model training details in diffusion_model_training/ffhq/.