Skip to content

A video database bridging human actions and human-object relationships

License

Notifications You must be signed in to change notification settings

RishiDesai/ActionGenome

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Action Genome

This repo contains README and snippets for using the Action Genome dataset v1.0.

Prerequisite

To use the snippets in this repo, python 3 and ffmpeg are required.

Get started

Download videos and annotations

Download Charades videos (scaled to 480p) from here and extract (or softlink) them under dataset/ag/videos.

Download Action Genome annotations and place them under dataset/ag/annotations.

Dump frames

We are not releasing the dumped frames from Charades videos. Instead, you can download the Charades videos from here and dump the frames following the instruction below.

After preparing all 480p videos into your dataset/ag/videos, dump the frames into dataset/ag/frames:

python tools/dump_frames.py

The dumped frames are ~74GB. The dumping may take half a day to finish. Note that we have only annotated sampled frames (see the sampling strategy in our paper) rather than all frames. If you prefer to dump all frames, run:

python tools/dump_frames.py --all_frames

Annotations structure

The object_bbox_and_relationship.pkl contains a dictionary structured like:

{...
    'VIDEO_ID/FRAME_ID':
        [...
            {
                'class': 'book',
                'bbox': (x, y, w, h),
                'attention_relationship': ['looking_at'],
                'spatial_relationship': ['in_front_of'],
                'contacting_relationship': ['holding', 'touching'],
                'visible': True,
                'metadata': 
                    {
                        'tag': 'VIDEO_ID/FRAME_ID',
                        'set': 'train'
                    }
            }
        ...]
...}

Noticeably, 'visible' indicates if the interacted object is visible in the frame.

The person_bbox.pkl contains the person bounding boxes of each frame. Here we release the Faster-RCNN detected person boxes as we've used in our paper. In our next version of the dataset, we'll release person boxes labeled manually.

The frame_list.txt contains all frames we've labeled.

The object_classes.txt contains all classes of objects.

The relationship_classes.txt contains all classes of human-object relationships.

About

A video database bridging human actions and human-object relationships

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%