Language is the most basic form of communication between human beings. Scientists have long been interested in speech recognition. In 1952, Bell Laboratories developed a speech recognition Andry system that can recognize a single number 0-9. Since the 1960s, Carnegie Mellon University in the United States has carried out research on continuous speech recognition and developed the first speech recognition system in the 1980s “Speaker independent continuous speech recognition system”, but limited by the limitations of traditional classical models, the development of language recognition stopped here. Until 2006, deep confidence network (DBN) solved the problem that model training is easy to fall into local optimization, and 2011 deep neural network (DNN) The application of the model in large vocabulary continuous speech recognition has been successful, driving the climax of the application of speech recognition industry.

Of course, during this period, China is also exploring speech recognition. In 1958, the Institute of acoustics of the Chinese Academy of Sciences used electronic tube circuits to recognize 10 vowels. In 1973, the Institute of acoustics of the Chinese Academy of sciences began to study computer speech recognition. In 1986, China’s “863 plan” specifically listed speech recognition as a research topic for the first time, and then came into being enterprises such as iFLYTEK, Baidu, yunzhisheng and spitch.

Although the application of DNN technology has made a great breakthrough in continuous speech recognition with large vocabulary, it is unable to model the changes in time series. In short, each word in a paragraph can be recognized, but the whole sentence can not read smoothly, the words can not express the meaning, and it can not be recognized into a complete sentence according to the context and context, and the cyclic neural network (RNN) The output of the previous time can be used as the input of the next time to generate a complete sentence. On this basis, iFLYTEK proposes a feedforward sequential memory network (fsmn), which combines the advantages of fsmn and RNN algorithm to improve the accuracy of language recognition, shorten the cycle of model training and reduce the response time of recognition.

Vehicle face recognition may not be as popular as speech recognition, but human beings study speech and face recognition almost at the same time. As early as the 1950s, they have begun to carry out theoretical research on face recognition. In the 1960s, the geometric structure of human face was mainly used to identify the feature points of human face organs and the topological relationship between them. In the 1990s, “feature face” method was introduced to recognize face. In 2013, researchers of Microsoft Research Asia first tried 100000 large-scale training data. Since 2014, the application of in-depth learning has greatly improved the accuracy of face recognition, and the industrialization of face recognition has entered the fast lane.

Deep learning refers to the feature classification and screening of objects layer by layer. The first layer may look for simple edges, the second layer may look for edge sets that can form simple shapes such as rectangle or circle, and the third layer may recognize features such as eyes and nose through face detection, feature point recognition Feature extraction and feature comparison technology finally combine these features to master the concept of “face”. The more layers of neural network model, the more detailed the classification of face features, the more distinguishable features, and the higher the accuracy of face recognition. The “design deep learning brain” driven by Shangtang technology parrots platform has 1207 layers of network and 1000 + layers of screening, and the recognition accuracy can reach 99%.

In this era of interconnected things, how can iFLYTEK’s language recognition and Shangtang’s face recognition empower Weima intelligent vehicle? At present, all smart cars on the market need to wake up “Hello, so and so” every time you want to control some functions of the car through voice (open the window, open the air conditioner, etc.). However, if a Weima car approaches after unlocking before you start, and says hello to you when you are outside the car, this appearance mode with its own protagonist aura can always attract the attention of others. After getting on the bus, actively adjust the positions of seats, rearview mirrors, steering wheel, etc. according to your identity, log in and enjoy the rights and interests of members. You can “understand” your idea at a glance. When you are tired, you can warn the owner through the prompt sound of the instrument, or use voice to remind the owner whether you need “some music” and remotely control the smart home. Is there a leapfrog experience?

To realize the above functions, you should first answer the question: how does this car know who are you? This mainly depends on the built-in driver’s camera in the cockpit. In addition to being used for safe driving – driver fatigue and distraction detection, the camera is also the entrance for face recognition. Finally, the identity of the face is recognized through face detection, feature point recognition, feature extraction and feature ratio.

The addition of the camera in the smart car is a new human car face interaction port in addition to the voice interaction. The feeling of voice interaction is that he doesn’t know who you are. Anyone can instruct it and do what you say. It is more like a robot without soul emotion. The biggest advantage of face interaction is that it can accurately identify who are you? Where you come from? What do you like? What can I do specifically for you! According to the driver’s identity, match the corresponding function settings, one eye can understand you for a second, and realize the exclusive intelligent cockpit experience.

In addition to the above independent functions of speech recognition and face recognition, Weima’s intelligent interactive AI Xiaowei deeply integrates all sensor data of the whole vehicle for big data analysis, and develops functions such as emotion recognition, takeover ability and front vehicle start reminder. In the past, the car start reminder function was taken as an example. After parking at the traffic light intersection, drivers usually look at their mobile phones, send wechat, make a phone call, change music, etc. in short, few people will pay attention to the front at this time. AI Xiaowei calculates the position and speed of surrounding vehicles through the front camera, Then calculate the running state, speed and other information of the vehicle according to multiple sensors of the vehicle, judge the driver’s distraction in combination with face recognition, and remind the driver to start by voice or instrument when he is not looking ahead. These are based on the existing hardware of the whole vehicle, explore the “pain points” of users’ driving vehicles, deeply integrate the sensor data of the whole vehicle, and develop new functions. The subsequent functions are realized through continuous OTA upgrading.

Deep neural network in 5g era promotes the breakthrough of human-computer interaction technology

In addition to the intelligent interaction in the car, Weima also breaks the boundaries of space, integrates resources such as the Internet of vehicles and the Internet of things (IOT), seamlessly connects the third space and home in the car, realizes the opening and closing of home air conditioner, water heater and room lamp through voice remote control in the car, and solves the problem of always worrying about opening air conditioner, water heater, etc. in advance before entering the door and going out “Forget to turn off all kinds of electrical appliances” After you get home, you can also check your car status, mileage, charging capacity, etc. at home, you can plan your itinerary in advance according to your car power. With the help of IOT smart devices and joining the Internet of things ecosystem, 90 million smart home products of 8 categories and 30 types have been supported. In the later stage, you are still expanding the scenarios used by the Internet of vehicles and developing new products More practical features.

For human vehicle interaction, Weima has not stopped the general function design of “observing words and colors”, but also deeply explores the real needs of users, and understands users’ current emotions, attitudes and intentions through facial expressions, eyes, body movements, gestures and other details. The ultimate goal is to be more personalized in multi-modal interactive experience. Around the voiceprint recognition, gesture recognition, expression recognition, line of sight tracking and carbinsensing perception technology in multi-modal interactive scenes, exploration and research are also being carried out simultaneously, with the help of deep learning technology to improve the accuracy of recognition, continuously promote the industrial development of relevant technologies, and empower Weima intelligent vehicle in the near future.

Leave a Reply

Your email address will not be published. Required fields are marked *