May 02, 2021 ( last updated : May 01, 2021 )
deeplearning machinelearning AI

Abstract

In the field of Computer Vision, Image Segmentation is a crucial area that involves dividing images into segments to simplify or change the representation of an image into something more meaningful and easier to analyze. This blog dives into several prominent architectures that play a vital role in the development of image segmentation technologies.

U-Net Based

Originally designed for medical image segmentation, the U-Net architecture is notable for its effective use of data through the use of a symmetric encoder-decoder structure.

Architecture

U-Net is designed with:

An encoder path to capture context and a decoder path to enable precise localization.
Skip connections that help recover spatial information lost during downsampling.

Metrics/Loss

The performance of U-Net is often evaluated using the Intersection over Union (IoU) and Dice Coefficient metrics. Common loss functions include:

Cross-Entropy Loss for binary classification.
Dice Loss, particularly useful for data with imbalanced foreground and background.

Transformer Based

Transformers have recently been adapted for the task of image segmentation, leveraging their ability to handle global dependencies effectively.

Architecture

Transformer models for segmentation like SETR or SegFormer integrate the transformer's self-attention mechanism to model long-range dependencies across the image.

These models often feature:

A transformer encoder to process the image as a sequence of patches.
A decoder that reconstructs the segmentation map from the encoded features.

Metrics/Loss

For transformer-based models, standard segmentation metrics such as IoU and the Dice coefficient are used. Loss functions typically include:

Cross-Entropy Loss, calculated on a per-pixel basis.
Focal Loss, designed to address class imbalance by focusing on harder examples.

DeepLab

DeepLab is a series of models that excel at semantic segmentation using atrous convolutions to capture multi-scale information without losing resolution.

Architecture

DeepLab architectures utilize:

Atrous Spatial Pyramid Pooling (ASPP) to robustly segment objects at multiple scales.
Encoder-decoder structure with depthwise separable convolutions to optimize computation.

Metrics/Loss

DeepLab models leverage pixel accuracy and mean IoU for performance measurement. Losses include:

Softmax cross-entropy loss for classifying each pixel.
L1 or L2 losses if depth prediction is integrated into the task.

Each of these architectures offers unique advantages in handling the complexities of image segmentation, making them suitable for a variety of applications from medical imaging to autonomous driving.

Originally published May 02, 2021
Latest update May 01, 2021

Related posts :

Object Detection Architecture

Image Segmentation Architecture

Abstract

U-Net Based

Architecture

Metrics/Loss

Transformer Based

Architecture

Metrics/Loss

DeepLab

Architecture

Metrics/Loss