Skip to content

Object Detection

BrunoGeorgevich edited this page Feb 1, 2022 · 3 revisions

TensorFlow Object Detection API

Tensorflow's Object Detection API has several pre-trained models such as efficientnet and ssd_mobilenet_v2_fpnlite.

  • We use the TF2 version, whose installation step-by-step can be found here.
  • Pre-trained models can be found here.
  • Our training script: ai/training/tfod/main.py.
  • CLI for training: python main.py.

Configuration

  • tfod/config.py
...
TRAIN_IMAGES_DIR = "../data/raw/Images/"
TRAIN_ANNOTS_DIR = "../data/raw/Annotations/"

ANNOTATION_PATTERN = "xywh"  # YOLO format

HAS_TEST_FOLDER = False
TEST_IMAGES_DIR = TRAIN_IMAGES_DIR
TEST_ANNOTS_DIR = TRAIN_ANNOTS_DIR
LABELMAP_PATH = "labelmap.pbtxt"
...
PRETRAINED_MODEL = "models/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8"

CONFIG_PATH = os.path.join(PRETRAINED_MODEL, "pipeline.config")
CKPT_PATH = os.path.join(PRETRAINED_MODEL, "checkpoint")
SUMMARY_PATH = f"{PRETRAINED_MODEL}/results"
...

Please check config.py for more fine-tuning settings.

  • labelmap.pbtxt
item {
    id: 1
    name: 'person'
}

item {
    id: 2
    name: 'gun'
}

item {
    id: 3
    name: 'helmet'
}
  • params.txt
0.01;0.5

We can change learning_rate and confidence_threshold online, meaning, while training.

DETR

DETR (DEtection TRanformer) treats object detection as a problem of predicting a (fixed) collection of objects.

DETR

Unlike traditional object detection methods, there are no anchors or filtering of bounding boxes via non-maximum suppression. DETR implements a traditional transformer architecture in the resulting feature vector of a CNN. The output, finally, is the prediction set.

Configuration

  • datasets/__init__.py
def build_dataset(image_set, args):
    ...
    if args.dataset_file == 'hsc':
        return build_hsc(image_set, args)
    ...
  • datasets/coco.py
def build_hsc(image_set, args):
   ...
   return dataset
...

CenterNet

CenterNet

Object detection with CenterNet models an object as a single point -- the center of its bounding box. Other properties, such as the object's size, are then regressed directly from the image's attributes at its central location.

Configuration

See HSC dataset.

Faster RCNN

Faster RCNN

Faster RCNN comprises two-stage detectors, whose object detection happens after a region suggestion phase. It is an architecture that succeeds R-CNN and Fast RCNN, whose performance improvements come from the evolution in the suggestion of the regions of interest.

Configuration

See scripts in ai/training/frcnn.

YOLO

Models in the YOLO (You Only Look Once) family are part of the single-stage detector category. Meaning its detection and classification of objects is performed in just one "pass" through the network. Other popular detectors, also single-stage, are SSD (we used with TF OD) and RetinaNet.

Among the YOLO models, we evaluated two: YOLOv4 and YOLOv5. The first proved difficult to achieve convergence, so we decided to try the second, YOLOv5. Despite the original authors' not implementing it, this version proved to be equivalent to the previous one in COCO val2017. However, its differentials were allowing a higher input resolution, having different sizes of architectures, and providing a script framework for the model's implementation, such as the export to ONNX.

YOLOv4

YOLOv5

YOLOv5 -- Source: https://github.com/ultralytics/yolov5/issues/280#issuecomment-1001850116

  • Repository: https://github.com/ultralytics/yolov5
  • CLI:
    • Training: python train.py --img 384 --batch 8 --epochs 100 --data ../data/hsc.yaml --weights yolov5x.pt --project hsc --name v1_yolov5x

Configuration

Check hsc.yml and YOLOv5 documentation.