Object Detection Architecture

Object Detection Architecture

May 21, 2021 ( last updated : April 22, 2021 )
deeplearning machinelearning AI


Abstract

In Computer Vision, Object Detection is very essential problem. We had a lot of architecture: Dense, CNN, Transformer.... Today, we'll deep dive into it.

Transformer Based

Anchor-based approach of Yolo can improve speed, but we need NMS and some post process for noise filer

It's waste time in tuning. So we can use Transformer architecture for end to end training and inference.

Architecture

Detection Transformer Output:

MLP (Multi-Layer Perceptron): A simple fully connected network used within the DETR model to process features (e.g., to compute bounding box coordinates).

QKV:

Metrics/Loss

SetCriterion: This class calculates several losses for training the DETR model. It uses:

CNN

ResNet BackBone Yolov8

Architecture

Using Convol technique with kernel for scaning, and then, learning feature from images.

In Yolo, we have Anchor-based technique to stable learning. Model classify anchor box and minimize IoU.

Data Augumentation

Using Mosaic Mixup Augumentation to avoid overfiting and improve accuracy in test.

Loss

Initial technique

Using weight init to set bias conv detect = -1 to almost anchor output will overlapse with groundtruth. Set bias conv classify = -3 to accuracy from start, is background, only learn positive

Mask R-CNN/Faster R-CNN

Architecture

Yolov8

Loss

Originally published May 21, 2021
Latest update April 22, 2021

Related posts :