Code and models of paper. " ECO: Efficient Convolutional Network for Online Video Understanding"
By Mohammadreza Zolfaghari, Kamaljeet Singh, Thomas Brox
- 2018.7.30: Adding codes and models
- 2018.4.17: Repository for ECO.
This repository will contains all the required models and scripts for the paper ECO: Efficient Convolutional Network for Online Video Understanding.
In this work, we introduce a network architecture that takes long-term content into account and enables fast per-video processing at the same time. The architecture is based on merging long-term content already in the network rather than in a post-hoc fusion. Together with a sampling strategy, which exploits that neighboring frames are largely redundant, this yields high-quality action classification and video captioning at up to 230 videos per second, where each video can consist of a few hundred frames. The approach achieves competitive performance across all datasets while being 10x to 80x faster than state-of-the-art methods.
Action Recognition on UCF101 and HMDB51 | Video Captioning on MSVD dataset |
---|---|
Model trained on UCF101 dataset | Model trained on Something-Something dataset |
---|---|
- Requirements for
Python
- Requirements for
Caffe
(see: Caffe installation instructions)
Build Caffe
```Shell
cd $caffe_FAST_ROOT/
# Now follow the Caffe installation instructions here:
# http://caffe.berkeleyvision.org/installation.html
make all -j8
```
After successfully completing the installation, you are ready to run all the following experiments.
-
Download the initialization and trained models:
sh download_models.sh
-
Train ECO Lite on kinetics dataset:
sh models_ECO_Lite/kinetics/run.sh
- Data
- Tables and Results
- Demo
Questions can also be left as issues in the repository. We will be happy to answer them.