Fine-tuning Detection Transformer (DERT)

The main purpose of this repository is to fine-tune Facebook's DERT (DEtection Transformer).

Author: Doramas Báez Bernal
Email: [email protected]

Index

Introduction
Requirements
Detection Transformer (DERT)
References

Introduction

Unlike traditional computer vision techniques, DETR approaches object detection as a direct set prediction problem. It consists on a set-based global loss, which forces unique predictions via bipartite matching, and a Transformer encoder-decoder architecture. Given a fixed small set of learned object queries, DETR reasons about the relations of the objects and the global image context to directly output the final set of predictions in parallel. Due to this parallel nature, DETR is very fast and efficient (paper).

Requirements

This section indicates the main dependencies of the project:

torch>=1.5.0
torchvision>=0.6.0
pycocotools

Also, it is necessary to download the following directories:

Dataset for the fine-tuning
Checkpoints of the model after fine-tuning

Therefore, the project must have the following structure:

path/to/DERT-finetune/
├ dert.ipynb            # dert notebook
├ train_custom_coco/    # folder containing dataset for fine-tuning
│   ├ annotations/        # annotation json files
│   ├ image_test/         # Images for testing after fine-tuning
│   ├ train2017/          # train images
│   └ val2017/            # val images
├  outputs/              
│   └ checkpoint.pth      # checkpoint of the model
└  data/                 
    ├ dert_finetune/      # DETR to fine-tune on a dataset
    └ images/             # Images for the readme

Detection Transformer (DERT)

General information (DERT)

DETR directly predicts (in parallel) the final set of detections by combining a common CNN with a transformer architecture. During training, bipartite matching uniquely assigns predictions with ground truth boxes. Prediction with no match should yield a “no object” (∅) class prediction. So, they adopt an enconder-decoder architecture based on transformers, a popular architecture for sequence prediction. Applying this architecture and using the concept of self-attention, this architecture is able to predict all objects at once, and is trained end-to-end with a set loss function which performs bipartite matching between predicted and ground-truth objects.

The next thing to be discussed is the architecture in detail:

In the previous figure it can be seen that, DETR uses a conventional CNN backbone to learn a 2D representation of an input image. Then, the model flattens it and supplements it with a positional encoding before passing it into a transformer encoder (this will be the input of the encoder). A transformer decoder then takes as input a small fixed number of learned positional embeddings, which we call object queries, and additionally attends to the encoder output. Finally, each output embedding is passed to a shared feed forward network (FFN) that predicts either a detection (class and bounding box) or a “no object” class.

Fine-tuning

For the fine-tuning a dataset has been prepared. This dataset contains approximately 900 images belonging to a larger dataset, the coco dataset. In this case, the subset consists of 3 classes:

fire hydrant
parking meter
stop sign

Example of the images used:

Results

The following results have been obtained by adapting the model weights (fine-tuning) for 30 epochs:

References

Official repositories:
- Facebook's DERT (paper)
- Facebook's detectron2 wrapper for DERT
- DERT checkpoints: for the fine-tune, we will remove the classification head.
Requirements:
- Dataset for fine-tune DERT
- The last checkpoint (inside outputs folder)
Special mention:
- Build your own dataset
- Example of fine-tuning dert by woctezuma
- Fork of DERT prepared to fine-tune on custom dataset by woctezuma
Official notebooks:
- An official notebook ilustrating DERT
- An official notebook for using COCO API
Tutorials:
- A Github Gist explaining how to fine-tune DERT
- A Github issue explaining how to load a fine-tuned DERT

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
README.md		README.md
dert.ipynb		dert.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine-tuning Detection Transformer (DERT)

Index

Introduction

Requirements

Detection Transformer (DERT)

General information (DERT)

Fine-tuning

Results

References

About

Releases

Packages

Languages

doramasma/DERT-finetune

Folders and files

Latest commit

History

Repository files navigation

Fine-tuning Detection Transformer (DERT)

Index

Introduction

Requirements

Detection Transformer (DERT)

General information (DERT)

Fine-tuning

Results

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages