Intelligent target detection system based on machine vision is widely used, especially in the field of aerospace and military industry, which often involves the real-time detection and control of high-speed targets, and puts forward more stringent requirements for the intelligence and real-time of target detection. Compared with radar and sonar, vision system has the characteristics of large amount of information, strong anti-jamming ability, flexible software processing, small volume and weight, low cost and so on. But the disadvantage is that the transmission and processing need more time, so it is difficult to meet the real-time requirements of image information transmission and processing.

High speed cameras usually transmit images to the image processor through GigE, Cameralink, USB3.0 and other interfaces, which consumes a lot of time to the information transmission channel. In order to solve this problem, the best way is to directly process the image collected by the sensor chip at the near end. With the advantages of hardware parallel operation, FPGA is more and more used in high-speed cameras and high-speed motion detection systems, which greatly improves the image processing speed and ensures the high-speed, real-time and accuracy of the system. Through FPGA to process the near end of the image sensor, the image acquisition and intelligent processing can be synchronized. The most important problem to be solved is to optimize the intelligent algorithm to make the operation more simple and efficient, and occupy less resources.

At present, many scholars are committed to the research of high-speed visual object detection system. Guqy and others designed a 2000f / s high-speed smart camera, which can monitor the target intelligently and in real time. After that, a high frame rate video mosaic system is designed, which uses an improved feature-based video mosaic algorithm and can synthesize panoramic images in real time with a frame rate of 500F / s. Chen JG of MIT measured the displacement of the object on the cantilever beam with a high-speed camera (5000f / s). The data were analyzed off-line by PC, and the vibration curve consistent with that measured by laser vibrometer and accelerometer was obtained. The frequency domain analysis of three groups of data is carried out by FFT algorithm, and each resonance frequency component is obtained.

In this paper, a high-speed camera platform based on zynq7000 is designed with high frame rate and real-time performance as the breakthrough point. It makes full use of FPGA resources on the chip and the advantages of hardware parallelism to realize the target extraction and centroid detection algorithm. This paper optimizes the FPGA algorithm of target detection, removes the intermediate buffer link, and uses pipeline structure for real-time pipeline processing of image data, which improves the processing efficiency of target detection algorithm. It can complete the position detection operation within limited clock cycles after each frame image acquisition, and achieve synchronous detection. Experimental results show that the system can achieve real-time target detection with 560 × 480 resolution, 1100f / s and 3 pixel accuracy.

1 hardware system design

1.1 system composition

In order to meet the requirements of high-speed real-time, this system uses FPGA to drive high-speed CMOS sensor directly to achieve near end processing. The detection system is mainly composed of FPGA main control unit, CMOS image acquisition unit, multi power rail power supply unit, external interface unit and optical imaging unit, as shown in Figure 1.

Design of high frame rate intelligent target detection system based on FPGA zynq7000

Zynq7020 chip is selected as the main control unit of FPGA, and arm hard core and FPGA resources are integrated on the chip. Arm is responsible for the configuration of CMOS sensor, and FPGA is responsible for the data processing of acquired image, the realization of target detection algorithm, and the output of image and position information.

The CMOS image acquisition unit adopts the python 300 gray-scale CMOS sensor. The resolution of the sensor is 640 × 480, which can achieve the full resolution output of 815f / s, and can further improve the frame rate through the window opening (ROI) operation.

The external interface unit includes HDMI display interface, serial port, JTAG interface and other circuits to realize image display, position coordinate transmission, debugging and downloading.

1.2 hardware design

The hardware circuit of the system is designed into two parts: FPGA main control board and high-speed backplane, which are interconnected and transmitted by standardized high-speed connector. The main control board adopts the finished high-speed FPGA core board, and the bottom board adopts the design of 4-layer PCB board, integrating CMOS circuit, HDMI display circuit, power supply circuit, serial port circuit, etc.

The baseboard design is mainly for the reasonable layout and wiring of the above parts of the circuit. Because CMOS sensor outputs low voltage differential signal (LVDS), the data rate of each channel can reach 720mb / s, so signal integrity must be considered in design.

The special processing of the signal is carried out during the wiring, strictly following the rules of high-speed differential lines: each pair of differential lines are parallel wiring, keeping the same minimum distance as far as possible, and less than the line width; reducing the number of vias; the wiring corner is greater than 90 °; the differential impedance is controlled at 100 Ω, matching with the 100 Ω terminal resistance of the differential signal receiving end, reducing the reflection of the signal; each group of differential lines is arranged The length of the lines should be consistent as far as possible, and a large distance should be kept between the difference lines of each group.

Through the above measures, the signal integrity of high-speed differential signal and the small delay difference of each group of signals are ensured.

2 software system design

The software design mainly realizes two functions: enable control and register configuration. Enable control controls the clock and power supply of CMOS through IO operation of ARM processor; register configuration is that arm communicates with CMOS sensor through SPI bus IP core, and configures some necessary registers, mainly including window size, image depth, operation mode, image data output, etc.

By configuring the CMOS internal registers, the CMOS sensor can output high-speed video stream image with 8 bit depth, 560 × 480 resolution and more than 1000F / s, which is transmitted to FPGA through LVDS interface for data processing and algorithm implementation.

FPGA implementation of signal processing and detection algorithm

3.1 principle of target detection

3.1.1 target extraction

In order to detect the target, we need to distinguish the target from the background and extract them. Combined with the application scene, the system selects the method of background difference and threshold segmentation to extract the target.

Firstly, a clear and stable background image is obtained, and then the difference between the current frame image and the corresponding pixel value of the background image is made to complete the difference operation. Then, the difference value is compared with the set threshold value. If it is greater than the threshold value, it is determined as 1, that is, moving foreground; otherwise, it is 0, that is, background, and binary image is generated.

3.1.2 centroid detection

The object to be detected by this system is a sphere. In the binarization image after threshold segmentation, the object appears as a circular bright spot. Considering the particularity of the target and the characteristics of FPGA pipeline structure, this paper uses the method of circle diameter detection to find out the intersection point of the straight line with the diameter in X direction and Y direction, so as to determine the position of the center of the circle.

The specific method is shown in Figure 2: add the pixel gray values of each line of the binary image, and compare the sum in pairs. A maximum value will be generated in the line with the diameter, and the number of lines corresponding to the maximum value will be regarded as the Y coordinate of the center of the circle. The X coordinate can also be obtained by doing the same operation in the column direction.

The method of detecting and calculating the center of circle with FPGA pipeline structure can reduce the detection delay and improve the real-time performance.

3.2 FPGA logic design

The transmission of CMOS sensor image data is from left to right and from bottom to top. Every 8 pixels is a group, which is called a kernel. Because the target frame rate is less than 1000F / s and the update period of each frame is less than 1ms, most of the time is used to obtain the image, so the image caching and processing process cannot be completed in the current frame period.

The system makes full use of the characteristics of FPGA parallel operation, adopts three-level pipeline structure in logic design, as shown in Figure 3, and removes the intermediate buffer link. While reading the image, each group of data is directly sent to the pipeline for step-by-step processing. The pipeline can process three groups of data at the same time, and the operation of 8 pixels in each group is also simultaneous. In this way, the process of image reading and processing is synchronized to ensure the efficiency and real-time of data processing.

The three-level pipeline structure corresponds to the three steps of target detection

(1) Background subtraction

While obtaining the current kernel value, read the background kernel value of the corresponding address in the background frame, make the difference of 8 pixel values at the same time, get the difference of each pixel position, store it in the difference register, and input it into the next level pipeline. Then the pixels of the next kernel are processed immediately until the complete image is read.

(2) Threshold segmentation

After the difference register is updated, the difference value of 8 pixels is compared with the set threshold value. If the difference value is greater than the threshold value, the corresponding position pixel of the binary register is assigned to the maximum value. Otherwise, it is assigned to 0, and the result is input to the next level pipeline. Then the next kernel is segmented.

(3) Centroid detection

The centroid detection logic is divided into two branches, which calculate the X and Y coordinates of the target centroid respectively.

In the logic of x-coordinate calculation, 560 column addition registers are set. Every time the binary register is updated, 8 binary pixel values are added to the column addition register of the corresponding column. When reading the whole frame image, compare the value of each column addition register to get the maximum value and the corresponding number of columns, which is the X coordinate.

In the logic of calculating Y coordinate, two registers are set, one stores the sum value of the current row pixel value, the other stores the maximum value of the row pixel sum value. After reading a line, compare the sum value register with the maximum sum value register. If it is greater than the maximum sum value, update the maximum sum value to the line sum value and record the number of lines at this time. Otherwise, keep the maximum sum value and the corresponding number of lines unchanged. After reading a frame of image, the number of rows corresponding to the maximum sum value is the Y coordinate of the centroid.

4 system test and result analysis

4.1 test environment

The camera is fixed by an optical flat plate to keep the camera stable; the detection target is a black carbon ball with white A4 paper as the background; the lens is an industrial lens with a focal length of 6 mm, and the distance between the lens and the target is 20 cm, and the light is supplemented by a flat LED lamp during the test. The test is mainly divided into accuracy test and speed test.

4.2 accuracy test

After the camera is turned on, 500 images are collected as background frames. After that, the target is fixed on the background paper and sampled 10000 times continuously to test the single point acquisition accuracy, and the target position is output through the serial port to draw the image. The experiment is repeated 10 times, and the test results are shown in Figure 4. The typical value of single point accuracy is 3 × 3 (pixels).

4.3 speed test

4.3.1 frame rate test

When the camera operates at 8 bit depth and 560 × 480 resolution, the theoretical frame rate is 1164f / s. The frame rate is tested in the following ways: put the system in the running mode, open the serial port tool to receive coordinate data, and count the time at the same time, and calculate the frame rate by the number of coordinates received in a certain period of time.

The experimental results are as follows: the system runs for 10s, receives 11871 coordinate data, and obtains the frame rate measurement value of 1187f / s. Considering the timing error, it can be concluded that the measured frame rate is basically consistent with the theoretical frame rate, which meets the design requirements of the system.

4.3.2 exercise test

The motion of the system is tested by detecting the free falling process of the object. The system captures the whole process and sends the real-time position to the serial port. By analyzing the received position coordinates, the motion trajectory diagram shown in Fig. 5 and the y-axis displacement time relationship diagram shown in Fig. 6 are obtained.

It can be seen from Figure 6 that the trend of displacement curve is basically consistent with the theoretical curve, and slightly less than the theoretical value. During the test, the actual falling distance is 60mm and the theoretical falling time is 0.11s. In the actual measurement, 140 frames of images are collected, and the actual falling time is 0.12s, which is 0.01s longer than the theoretical time.

Analysis of the test results: first of all, the influence of air resistance on the motion of free falling body should be considered, so that the value of acceleration is less than the gravity acceleration, and then the displacement is less than the theoretical value. In addition, it can be seen from Figure 5 that the falling direction does not coincide with the Y coordinate direction completely, and there is a displacement in the X direction, so the displacement in the Y direction is less than the predicted value. Considering the above two factors, it can be considered that the camera can accurately detect the high-speed movement of the object.

5 Conclusion

In this paper, a high frame rate visual real-time target detection system is developed. The hardware design, software configuration and FPGA algorithm implementation are introduced respectively. It adopts the strategy of FPGA near end direct intelligent processing and designs the pipeline processing structure, which greatly solves the real-time problem of high-speed intelligent visual inspection system. Finally, the system is tested, and the results show that the system realizes the real-time target detection of 560 × 480 resolution, 1100f / s high-speed video stream, and the accuracy reaches 3 pixels. The system can be applied to a variety of high-speed detection scenarios, such as displacement and velocity measurement, vibration analysis, high-speed target monitoring and control, etc. the follow-up work will improve the optimization algorithm, improve the detection accuracy, and extend from circular target to irregular target, improve the robustness of detection when the background changes.

Editor in charge: GT

Leave a Reply

Your email address will not be published. Required fields are marked *