在计算机视觉众多的技术领域中,目标检测(Object Detection)是一项非常基础的任务,图像分割、物体追踪、关键点检测等通常都需要借助于目标检测。目标检测作为基础任务通常和图像分类、图像分割相关联,我们简单看一下它们之间的区别与联系。

Image classification:Image classification focuses on only a single object in the input image, which is used to determine what category the image belongs to, such as large categories such as people and animals, or small categories of different types of animals, etc. These image-level tasks are relatively simple and easy Therefore, it is the first to be developed and used.

Target Detection:Target detection is biased towards an input image that contains many objects of multiple categories. The images we often shoot or see are often objects with multiple categories, which is more complicated. The purpose is to find out the position of different objects in the image and judge its category.

Image segmentation:Image segmentation is similar to the input of target detection. The difference is that it uses pixels in the image as the basic unit to determine the category of each pixel, which belongs to pixel-level classification. Generally, image segmentation and target detection are related to each other. Many models and methods can be used for reference. .

1. The basic concept of target detection

目标检测是对图像中所有感兴趣的目标进行分类并检测出它们各自的位置坐标。

As shown below,Object detected images includedog、bicycle、truckThree targets, and their respective location information was identified.

当然目标检测可以检测到各种类别信息,只要我们想检测某一图像当中是否有我们所需要的目标,我们都可以按照对预先标注类别的图像进行特征信息训练,使网络模型学习到已知目标特征,进而对其它图像进行目标类别与位置的识别。

2. The development history of target detection

Target detection is a traditional algorithm based on manual features at the beginning. Traditional algorithms are usually divided into three stages for target detection:Region Selection, Feature Extraction and Feature Classification

随着近年来计算机的发展,深度学习得到了广泛的运用,基于深度学习的目标检测成为目前流行的检测方法。

目标检测算法经过历年的研究和对网络模型不断改进与优化,出现了许多优秀的算法模型。

These models are mainly divided intotwo types

For example, the two-stage R-CNN series (2-stage) detection model and single-stage (1-stage) detection model.

Since Faster-RCNN proposed the anchor mechanism, many subsequent improved algorithms have followed this method. Therefore, there is another way to divide the model, which is divided according to whether the anchor mechanism is applied or not.anchor-baseandanchor-free

1.Two stage与One stage

1)Two stage

常见two stage目标检测算法有:R-CNN、SPP-Net、Fast R-CNN、Faster R-CNN和R-FCN等。

在two-stage的⽬标检测领域中,以faster R-CNN模型为典型代表。

firstThe algorithm will first go throughExtracting features from the main network, and then the extracted feature map will first pass the Region proposal network (RPN network) to generate a candidate region (Region Proposal, referred to as RP, including the region suggestion box of the detection target), and generate a region of interest (ROI) according to the feature map and RP To complete the regression classification of subsequent location coordinate information.

2)One Stage

常见的one stage目标检测算法有:OverFeat、YOLOv1~YOLOv7、SSD和RetinaNet等。

目前最新出来的著作YOLOv7在 5-160 FPS 范围内速度和精度超过所有已知目标检测器。

one stage检测模型不存在RPN部分,而是一步到位地在卷积网络中提取特征来预测目标的类别和位置。

所以在拥有特征提取网络的预训练权重后,整个one-stage是可以直接进⾏端到端的训练的。

All in all, the one-stage detector greatly simplifies the framework of the model structure, improves the inference speed and simplifies the training steps.

2.anchor-base和anchor-free

At present, target detection based on deep learning has gradually developed intoanchor-based、anchor-freeThe difference between the fusion type and the two is whether the anchor is used to extract the candidate target frame.

首先让我们了解下什么是anchor?

Anchor is also called anchor, is actually a preset set of bounding boxes of different scales and sizes. During network training, the real frame position is offset relative to the preset frame.

通俗点说就是预先在目标可能存在的位置设置预设框,然后再在这些预设边框的基础上进行细微调整。而它的本质就是为了解决标签分配的问题。

锚作为一系列先验框信息,其生成以下几个部分:

(1)用网络提取特征图的点来定位边框的位置;

(2)用锚的尺寸来设定边框的大小;

(3)用锚的长宽比来设定边框的形状。

1)anchor-base

近几年,anchor在目标检测领域应用十分广泛,使用anchor机制的模型有很多,包括Faster-RCNN、SSD、YOLOV2~ YOLOV7等。

The process of this type of algorithm can be divided into three steps:

(1)在图像或者点云空间预设大量的anchor(2D/3D);

(2)回归目标相对于anchor的四个偏移量;

(3)用对应的anchor和回归的偏移量修正精确的目标位置。

In the target detection algorithmone stageandanchor baseTake the target detection model as an example.

The process of model acquisition mainly includestrainandtesttwo parts.

训练的主要目的是利用训练数据集进行检测网络的参数学习,训练数据集包含大量的视觉图像及标注信息(物体位置及类别)。

training phaseThe main process includes data preprocessing, detection network, and label matching and loss calculation.

testing phaseMainly use the obtained training model to predict the input image, and obtain the detection result after post-processing.

(Ⅰ)训练过程

(Ⅱ)测试过程

(Ⅱ)测试过程

目标检测得到的是各个类别的名称与矩形框位置信息,在网络中通常用数字代替类别,比如用0代表Dog,1代表Cat,物体的位置信息通常用矩形边界框(Bounding Box)来表示。以边界框的四个点确定目标的位置信息。

non-maximum suppression

(Non-Maximum-Suppression,NMS)

模型预测阶段,我们给图像生成多个锚框,并分别预测类别与位置偏移量,但是会生成很多冗余的没有完全包含目标的预测框,也可能一个目标输出多个相似的预测框,因此,我们需要NMS操作得到跟真实目标最匹配的目标框。

先通过对预测框之间进行IOU(交并比)比较,通过设置阈值除去一些重叠较多的预测框,最终得到每个类别最高得分单个预测框。

As shown in the figure, the target detection output target box before and after the definition of intersection ratio and NMS processingschematic diagram

2)Anchor-free

anchor-free类算法代表是CornerNet、ExtremeNet、CenterNet、FCOS等。

Anchor-Free的目标检测算法有两种方式:

(1) Method based on joint expression of multiple key points

(2) Method based on single center point prediction

基于多关键点联合方法,是通过定位目标物体的几个关键点来限定它的搜索空间。例如 Grid R-CNN算法基于RPN找到候选区域,对每个ROI区域提取特征图。

将特征图传到全卷积网络层里面输出概率的热度图,用于定位与目标对齐的边界框的网格点,利用网格点进行特征图融合,最终确定目标的边界框。

而基于单中心点预测的方法,是通过目标物体的中心点来定位,然后预测中心到边界的距离。例如CenterNet将目标当成一个点来检测,即用目标box的中心点来表示这个目标,预测目标的中心点偏移量(offset),宽高(size)来得到物体实际box,而heatmap则是表示分类信息。

Each category has a heatmap. On each heatmap, if there is a center point of the object target at a certain coordinate, a keypoint (represented by a Gaussian circle) is generated at that coordinate.As shown below

由上可见,anchor-base和anchor-free的最主要区别在于定义正负样本和回归的方式。在anchor-free中,物体落到哪个网格,哪个网格就是正样本,其余都是负样本。anchor-base则计算每个anchor预选框和实际框的IOU,超过多少阈值就算正样本。

在回归部分中,anchor-free是基于point做回归的,而anchor-base是基于anchor box和ground truth之间的偏移做回归的。

This also led to the development of fusionanchor-basedandanchor-freeBranching methods, such as FSAF, SFace, GA-RPN, etc.

3. Application scenarios of target detection in vehicles

目标检测应用在我们生活的方方面面,随着自动驾驶领域的快速发展,目标检测算法在此领域也得到了极大的应用。

The application scenarios includeRoad pedestrian and vehicle detection, face detection in driver fatigue monitoring, detection of remnants in the smart cockpit, occupant position detectionWait.

1. Detection of pedestrians and vehicles outside the cabin

对道路上来往的行人与车辆进行检测,实时观察道路的运行状况。

2. Face detection of the driver in the cabin

检测驾驶员人脸框的位置,作为为实时监测驾驶员的状态的基础。

3. Detection of leftovers in the rear row of the cabin

检测下车后座舱内遗留的的物品,方便提醒驾驶员注意停车后座舱安全。

Leave a Reply

Your email address will not be published.