Please use the MediaPipe YouTube8M feature extractor which extracts both RGB and audio features instead.
This directory contains binary and library code that can extract YouTube8M features from images and videos. The code requires the Inception TensorFlow model (tutorial) and our PCA matrix, as outlined in Section 3.3 of our paper. The first time you use our code, it will automatically download the inception model (75 Megabytes, tensorflow GraphDef proto, download link) and the PCA matrix (25 Megabytes, Numpy arrays, download link).
There are two ways to use this code:
- Binary
extract_tfrecords_main.py
processes a CSV file of videos (and their labels) and outputstfrecord
file. Files created with this binary match the schema of YouTube-8M dataset files, and are therefore are compatible with our training starter code. You can also use the file for inference using your models that are pre-trained on YouTube-8M. - Library
feature_extractor.py
which can extract features from images.
You can use binary extract_tfrecords_main.py
to create tfrecord
files.
However, this binary assumes that you have OpenCV properly installed (see end of
subsection). Assume that you have two videos /path/to/vid1
and
/path/to/vid2
, respectively, with multi-integer labels of (52, 3, 10)
and
(7, 67)
. To create tfrecord
containing features and labels for those videos,
you must first create a CSV file (e.g. on /path/to/vid_dataset.csv
) with
contents:
/path/to/vid1,52;3;10
/path/to/vid2,7;67
Note that the CSV is comma-separated but the label-field is semi-colon separated to allow for multiple labels per video.
Then, you can create the tfrecord
by calling the binary:
python extract_tfrecords_main.py --input_videos_csv /path/to/vid_dataset.csv \
--output_tfrecords_file /path/to/output.tfrecord
Now, you can use the output file for training and/or inference using our starter code.
extract_tfrecords_main.py
requires OpenCV python bindings to be installed and
linked with ffmpeg. In other words, running this command should print True
:
python -c 'import cv2; print cv2.VideoCapture().open("/path/to/some/video.mp4")'
To extract our features from an image file cropped_panda.jpg
, you can use this
python code:
from PIL import Image
import numpy
# Instantiate extractor. Slow if called first time on your machine, as it
# needs to download 100 MB.
extractor = YouTube8MFeatureExtractor()
image_file = os.path.join(extractor._model_dir, 'cropped_panda.jpg')
im = numpy.array(Image.open(image_file))
features = extractor.extract_rgb_frame_features(im)
The constructor extractor = YouTube8MFeatureExtractor()
will create a
directory ~/yt8m/
, if it does not exist, and will download and untar the two
model files (inception and PCA matrix). If you prefer, you can point our
extractor to another directory as:
extractor = YouTube8MFeatureExtractor(model_dir="/path/to/yt8m_files")
You can also pre-populate your custom "/path/to/yt8m_files"
by manually
downloading (e.g. using wget
) the URLs and un-tarring them, for example:
mkdir -p /path/to/yt8m_files
cd /path/to/yt8m_files
wget http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz
wget http://data.yt8m.org/yt8m_pca.tgz
tar zxvf inception-2015-12-05.tgz
tar zxvf yt8m_pca.tgz