In recent years, with the popularity and landing of the concept of artificial intelligence, a wave of intelligence has been set off around the world, which has brought great benefits to the development of biological recognition, machine vision and other industries. At present, biometric recognition technology, represented by speech recognition and face recognition, has been widely used in the world, and has accelerated the realization of large-scale commercial use.
Among them, the development and application of speech recognition technology is becoming more and more mature, intelligent speakers, chat robots, customer service robots and other products have been widely recognized by consumers. However, a recent Stanford University study has shown that there may be racial differences in the voice recognition systems of Amazon, apple, Google, IBM and Microsoft.
The research shows that the error rate of white and black users in speech recognition systems of five major American technology companies is much less than that of black users. In addition, up to 20% of black users’ audio clips are judged unreadable by the system. According to this, Stanford University researchers believe that the voice recognition systems of these companies are obviously discriminatory. So what is the truth?
Generally speaking, with the help of artificial intelligence technology, intelligent products such as speech recognition system have self-learning ability, and can conduct self-training according to the data resources provided by developers, so as to continuously grow, improve system performance and service level, and enhance communication and understanding ability with human users. But in this process, the data resources provided by developers become the key. If the developers themselves have racial prejudice and the data resources they choose are highly directional, it will inevitably lead to the “habit” similar to racial prejudice in the initial learning of speech recognition system.
The above-mentioned research shows that the training system of Stanford University is not diversified enough. Therefore, if the speech recognition system can not get rich and diverse data in training, it is difficult to avoid the formation of “discrimination concept”. Compared with the “instinct” in the initial training, the influence of speech recognition system in the later stage of user use is also critical. If the main users of speech recognition system are a certain group, the source of continuous self-learning data will become very “monotonous”.
According to the feedback from the current U.S. market, the speech recognition systems developed by the five major technology giants are mostly white users, while black users are relatively less. Therefore, when there are more white users and fewer black users, the diversity of data collected and used by speech recognition system is insufficient, which will cause the recognition accuracy deviation of specific user groups.
Therefore, we should pay attention to the voice recognition of other ethnic groups in the process of receiving information, so as to avoid the occurrence of such information system.
In addition to the diversity of data provided by developers and users, many netizens believe that there should also be a certain correlation between this and the accent differences among different groups. Compared with white people, people of other races have more or less differences in pronunciation, which may be one of the reasons for the bias in speech recognition.
In fact, in our country, there are differences in speech recognition caused by local dialects. Generally speaking, the pronunciation of northerners and Southerners is not the same. As far as Putonghua recognition is concerned, the recognition accuracy of northerners is definitely higher. Therefore, dialect, pronunciation and other factors should also be taken into account, rather than simply judged as racial prejudice.
It can be seen that there are still many challenges for the speech recognition system to continue to expand the market. If the user group tendency, dialect and pronunciation can not be overcome, it is obviously not conducive to the further popularization of speech recognition products and the recognition of more users. For speech recognition industry, although the market prospect is broad, we must strive to overcome the current difficulties in order to usher in a real take-off.