在计算机视觉众多的技术领域中,目标检测(Object Detection)是一项非常基础的任务,图像分割、物体追踪、关键点检测等通常都需要借助于目标检测。目标检测作为基础任务通常和图像分类、图像分割相关联,我们简单看一下它们之间的区别与联系。

Image classification:Image classification focuses on only a single object in the input image, which is used to determine what category the image belongs to, such as large categories such as people and animals, or small categories of different types of animals, etc. These image-level tasks are relatively simple and easy Therefore, it is the first to be developed and used.

Target Detection:Target detection is biased towards an input image that contains many objects of multiple categories. The images we often shoot or see are often objects with multiple categories, which is more complicated. The purpose is to find out the position of different objects in the image and judge its category.

Image segmentation:Image segmentation is similar to the input of target detection. The difference is that it uses pixels in the image as the basic unit to determine the category of each pixel, which belongs to pixel-level classification. Generally, image segmentation and target detection are related to each other. Many models and methods can be used for reference. .

1. The basic concept of target detection


As shown below,Object detected images includedog、bicycle、truckThree targets, and their respective location information was identified.


2. The development history of target detection

Target detection is a traditional algorithm based on manual features at the beginning. Traditional algorithms are usually divided into three stages for target detection:Region Selection, Feature Extraction and Feature Classification



These models are mainly divided intotwo types

For example, the two-stage R-CNN series (2-stage) detection model and single-stage (1-stage) detection model.

Since Faster-RCNN proposed the anchor mechanism, many subsequent improved algorithms have followed this method. Therefore, there is another way to divide the model, which is divided according to whether the anchor mechanism is applied or not.anchor-baseandanchor-free

1.Two stage与One stage

1)Two stage

常见two stage目标检测算法有:R-CNN、SPP-Net、Fast R-CNN、Faster R-CNN和R-FCN等。

在two-stage的⽬标检测领域中,以faster R-CNN模型为典型代表。

firstThe algorithm will first go throughExtracting features from the main network, and then the extracted feature map will first pass the Region proposal network (RPN network) to generate a candidate region (Region Proposal, referred to as RP, including the region suggestion box of the detection target), and generate a region of interest (ROI) according to the feature map and RP To complete the regression classification of subsequent location coordinate information.

2)One Stage

常见的one stage目标检测算法有:OverFeat、YOLOv1~YOLOv7、SSD和RetinaNet等。

目前最新出来的著作YOLOv7在 5-160 FPS 范围内速度和精度超过所有已知目标检测器。

one stage检测模型不存在RPN部分,而是一步到位地在卷积网络中提取特征来预测目标的类别和位置。


All in all, the one-stage detector greatly simplifies the framework of the model structure, improves the inference speed and simplifies the training steps.


At present, target detection based on deep learning has gradually developed intoanchor-based、anchor-freeThe difference between the fusion type and the two is whether the anchor is used to extract the candidate target frame.


Anchor is also called anchor, is actually a preset set of bounding boxes of different scales and sizes. During network training, the real frame position is offset relative to the preset frame.







近几年,anchor在目标检测领域应用十分广泛,使用anchor机制的模型有很多,包括Faster-RCNN、SSD、YOLOV2~ YOLOV7等。

The process of this type of algorithm can be divided into three steps:




In the target detection algorithmone stageandanchor baseTake the target detection model as an example.

The process of model acquisition mainly includestrainandtesttwo parts.


training phaseThe main process includes data preprocessing, detection network, and label matching and loss calculation.

testing phaseMainly use the obtained training model to predict the input image, and obtain the detection result after post-processing.




目标检测得到的是各个类别的名称与矩形框位置信息,在网络中通常用数字代替类别,比如用0代表Dog,1代表Cat,物体的位置信息通常用矩形边界框(Bounding Box)来表示。以边界框的四个点确定目标的位置信息。

non-maximum suppression




As shown in the figure, the target detection output target box before and after the definition of intersection ratio and NMS processingschematic diagram




(1) Method based on joint expression of multiple key points

(2) Method based on single center point prediction

基于多关键点联合方法,是通过定位目标物体的几个关键点来限定它的搜索空间。例如 Grid R-CNN算法基于RPN找到候选区域,对每个ROI区域提取特征图。



Each category has a heatmap. On each heatmap, if there is a center point of the object target at a certain coordinate, a keypoint (represented by a Gaussian circle) is generated at that coordinate.As shown below


在回归部分中,anchor-free是基于point做回归的,而anchor-base是基于anchor box和ground truth之间的偏移做回归的。

This also led to the development of fusionanchor-basedandanchor-freeBranching methods, such as FSAF, SFace, GA-RPN, etc.

3. Application scenarios of target detection in vehicles


The application scenarios includeRoad pedestrian and vehicle detection, face detection in driver fatigue monitoring, detection of remnants in the smart cockpit, occupant position detectionWait.

1. Detection of pedestrians and vehicles outside the cabin


2. Face detection of the driver in the cabin


3. Detection of leftovers in the rear row of the cabin


Leave a Reply

Your email address will not be published.