Recent advances in machine and deep learning have improved slam technology, resulting in increased map richness, semantic scene understanding, improved positioning, map quality and robustness.
The recent crisis has made people pay more and more attention to using automatic robots for practical benefits. We’ve seen robots, delivering food and medicine, and even evaluating patients. These are amazing use cases that clearly illustrate how robots will play a bigger role in our lives from now on.
However, despite all the advantages, the ability of robot to automatically draw the surrounding environment and successfully locate itself is still very limited. Robots are becoming more and more capable of doing specific things in a planned environment. But the dynamic, untrained situation remains a challenge.
What I’m excited about is the next generation of Slam (simultaneous positioning and mapping), which will enable robot designers to create robots with stronger autonomous operation ability in various situations. It is under development and has attracted investment and interest from the whole industry.
We call it the “age of perception”, which combines the latest advances in machine and deep learning to enhance slam. Through semantic scene understanding to increase the richness of the map, can improve the positioning, map quality and robustness.
At present, most slam solutions obtain raw data from sensors and use probabilistic algorithms to calculate the robot’s position and map. Lidar is the most commonly used, but the increasingly low-cost camera provides a rich data stream for enhanced maps. No matter which sensor is used, the data will create a map composed of millions of 3D reference points. These allow the robot to calculate its position.
The problem is that these 3D point clouds don’t make sense – they’re just spatial references for robots to calculate their positions. Continuously processing all these millions of points is also a heavy burden on the robot’s processor and memory. By inserting machine learning into the processing pipeline, we can not only improve the practicability of these maps, but also simplify them.
The technology uses machine learning to classify the set of pixels from the camera feed into recognizable “objects”. For example, millions of pixels representing walls can be grouped into a single object. In addition, we can use machine learning to predict the geometry and shape of these pixels in the 3D world. As a result, millions of 3D points representing a wall can be summed up in one plane.
Millions of 3D points representing chairs can be aggregated into shape models with a few parameters. Decomposing the scene into 2D and 3D objects can reduce the cost of processor and memory.
What excites me is the next generation of slam, which will enable robot designers to create robots that can operate autonomously in a variety of situations. It is under development and has attracted investment and interest from the whole industry.
This method provides a basis for further understanding the scene captured by robot sensors. Through machine learning, we can classify individual objects in the scene, and then write code to determine how to deal with them.
The primary goal of this new feature is to be able to remove moving objects, including people, from the map. In order to navigate effectively, the robot needs to refer to the static elements of the scene. Something that doesn’t move, so it can be used as a reliable anchor. Machine learning can be used to teach autonomous robots which elements in the scene are used for positioning, which elements are not part of the map or classify them as obstacles to avoid. Combining the panoramic segmentation of objects in the scene with the basic map and location data will greatly improve the accuracy and function of slam.
The next exciting step will be to increase the level of understanding of individual objects based on this classification. As a part of slam system, machine learning will enable robots to learn to distinguish the walls and floors of a room from furniture and other objects in the room. Storing these elements as separate objects means that adding or deleting chairs does not require a complete redrawing of the map.
The combination of these advantages is the key to the great progress of autonomous robot function. Robots can not be well promoted without training. Changes, especially rapid movement, will destroy the map and increase a lot of computation. Machine learning creates a layer of abstraction, which can improve the stability of the map. Its more efficient data processing costs more sensors and more data, thus increasing the granularity and the information that can be included in the map.
Machine learning can be used to teach autonomous robots which elements in the scene are used for positioning, which elements are not part of the map or classify them as obstacles to avoid.
By linking location, map and perception together, the robot can learn more about its surroundings and operate in a more useful way. For example, a robot that can sense the difference between a hall and a kitchen can execute a more complex set of instructions. Being able to identify and classify objects such as chairs, desks and cabinets will further improve this. It will be easier to instruct the robot to go to a particular room to get a particular thing.
When robots begin to interact with people in a more natural way, the real revolution of robotics will come. Robots that can learn from various situations and combine these knowledge into the model can make them perform new tasks without training according to the maps and objects stored in memory. Creating these models and abstractions requires full integration of all three layers of slam. Due to the efforts of the leading people in these fields, I believe that the era of perception is coming, and more information is in the Zhengong chain.
Editor in charge: PJ