Code for the ICLR 2020 paper V4D: 4D Convolutional Neural Networks for Video-level Representation Learning
Model | Backbone | Top1 | Top5 |
---|---|---|---|
ARTNet with TSN | ARTNet ResNet18 | 70.7 | 89.3 |
ECO | BN-Inception+3D ResNet18 | 70.0 | 89.4 |
S3D-G | S3D Inception | 74.7 | 93.4 |
Nonlocal Network | 3D ResNet50 | 76.5 | 92.6 |
SlowFast | SlowFast ResNet50 | 77.0 | 92.6 |
I3D | I3D Inception | 72.1 | 90.3 |
Two-stream I3D | I3D Inception | 75.7 | 92.0 |
I3D-S | Slow pathway ResNet50 | 74.9 | 91.5 |
Ours V4D | V4D ResNet50 | 77.4 | 93.1 |
- Python >=3.6
- PyTorch >=1.3
- torchvision that matches the PyTorch installation.
./scripts/train_kinetics.sh
./scripts/train_minikinetics.sh
Test pretrained model on Mini-Kinetics-200 and Kinetics (download our trained model from Google)
./scripts/test_kinetics.sh
./scripts/test_minikinetics.sh
For any questions, please feel free to reach
If you use this method or this code in your research, please cite as:
@inproceedings{zhang2020v4d,
title={V4D: 4D Convolutional Neural Networks for Video-level Representation Learning},
author={Zhang, Shiwen and Guo, Sheng and Huang, Weilin and Scott, Matthew R and Wang, Limin},
booktitle={Proceedings of International Conference on Learning Representations},
year={2020}
}
V4D is CC-BY-NC 4.0 licensed, as found in the LICENSE file. It is released for academic research / non-commercial use only. If you wish to use for commercial purposes, please contact [email protected].