The slam system of stereo vision + inertial navigation + lidar proposed in this paper can achieve good performance in some complex scenes such as tunnels. Vil-slam achieves this goal by combining tightly coupled stereo vision inertial odometer (vio) with lidar mapping and lidar enhanced vision loop closure. The system can generate real-time 6-DOF lidar attitude and near real-time 1cm voxel dense point cloud with loop closure correction. Compared with the most advanced LiDAR methods, vil-slam shows higher accuracy and robustness.

(a) Sensor equipment (b) building outdoor scene model

primary coverage

Vil-slam system diagram. The sensor is gray and the module is green. Arrows indicate how messages flow in the system. The dark thick arrow represents the real-time output of the system, and the light thick arrow represents the output generated by near real-time post-processing.

The system has four modules, as shown in Figure 2. The vision front end obtains stereo image from stereo camera. It performs frame to frame tracking and stereo frame matching, and outputs stereo matching results as visual measurement. Stereo vio uses stereo matching and IMU measurement, and performs IMU pre integration and smooth tight coupling results on the pose map. The module outputs vio attitude according to IMU and camera. Lidar mapping module uses motion estimation of vio, and performs LIDAR point denoising and scanning for map registration. The loop closure module detects the visual loop and estimates the initial loop constraints, and further fine registration is performed by sparse point cloud ICP alignment. The global attitude map constrained by all lidar attitude is optimized incrementally to obtain global correction trajectory and real-time lidar attitude correction. They are sent back to the lidar mapping module for map updating and relocation. In the post-processing, the lidar scanning frame is spliced with the best estimated lidar attitude to obtain dense point cloud map results

Fig. 6 the experimental setup uses two VPS to calibrate the camera

Visual front end

For stereo matching, this paper uses Kanade Lucas Tomasi (KLT) feature tracker to track all feature points in previous stereo matching, whether in the left or right image. It’s only when they’re all tracked that it’s output. Larger stereo baselines are helpful for scale estimation and reduce the degradation problems caused by long-distance features. We use the feature-based method, which is more suitable for dealing with long baseline than KLT. If the number of stereo matching is lower than the threshold, the feature extraction is performed by using the Shi tomashi corner detector, and then the feature elimination process is carried out to delete the feature with the pixel distance of any existing feature less than the threshold. After the orb descriptor, all the surviving features are calculated, and then violent stereo matching is performed to obtain new matching results. The system initializes the system by stereo matching the first frame of stereo vision.

Visual inertial odometer

The goal of stereo vio is to provide real-time and accurate state estimation at a relatively high frequency as the motion model of lidar mapping algorithm. It is a good trade-off between accuracy and efficiency for the tightly coupled fixed lag smoother running on the pose map. Optimization based methods usually allow multiple linearizations to approach the global minimum. The fixed lag pose map optimizer further limits the maximum number of variables, so the computational cost is bounded. Since poor visual measurement can lead to convergence problem, strict outlier rejection mechanism is enforced for visual measurement. The system eliminates outliers by checking the average re projection error.

The algorithm takes IMU pre integration factor and unstructured vision factor as constraints

Lidar mapping

Before LIDAR point extraction and point cloud registration, the high frequency IMU rate vio attitude is used as the motion model. Scan C is represented as a point cloud obtained from a complete lidar rotation. Geometric features are extracted from C, including points on sharp edge and points on plane. Then, based on the feature points (all previous feature points) currently scanned to the map, the residual Euclidean distance formed by the feature points is minimized, which is solved as an optimization problem.

Loop detection based on radar enhancement

Loop detection is very important for any slam system, because long-term operation will lead to drift. The purpose of loop closure is to eliminate the drift by optimizing the global pose map, which combines the loop constraint and relative transformation information of lidar. In order to assist lidar mapping better, the corrected lidar attitude is sent back in real time so that the newly scanned feature points can be registered on the revisited map. In this paper, ICP alignment is proposed based on visual bag loop detection and PNP loop constraint formula. The incremental solver isam2 is used to optimize the global attitude map, and the real-time performance is realized

experimental result

The vil-slam is evaluated and compared with the best real-time lidar based system loam2 on a custom dataset. The stereoscopic vio sub module (vil-vio) was also evaluated using the eurocav data set.

Here are the results of five typical environments, including featureless corridors, disorganized platforms, tunnels, and outdoor environments. The data collection of all these sequences starts and ends at the same point. The odometer (LIDAR mapping attitude) is evaluated based on the final drift error (FDE). Taking Faro scan as the basic true value, the average registration error (MRE) was used to evaluate the mapping results. Firstly, the map is aligned with the model (Faro scan), and then the Euclidean distance between the map point and the nearest point in the model is calculated. The odometer FDE and mapping results are shown in Table 1, which are better displayed in bold.

Map comparison: the map registration errors of vil-slam (right) and loam (left) compared with the model. The error (a-b) above 0.3m is indicated in red, and the error (c) above 0.5m is indicated in red. The discontinuous Red areas in blue and green are due to the occlusion of the scanning equipment.


Vil-slam is an advanced odometer and map building system, which aims to run stably in different environments for a long time. The current framework couples vil-vio with lidar mapping. We are extending it to a tightly coupled framework so that the accurate attitude estimation from lidar mapping can be used for IMU bias correction. In the loop closure, the sparse feature points between scan frames are refined by ICP to obtain better loop constraint.

Editor in charge: GT

Leave a Reply

Your email address will not be published. Required fields are marked *