According to foreign media reports, Google had previously said that they had made a breakthrough in the field of speech recognition, reducing the error rate to 30%. However, recently, IBM announced on their official website that they have set a new record for their own speech recognition, with an error rate of only 5.5%, which is another improvement compared with 6.9% last year.
These are mainly tested in very difficult speech recognition tasks, recording the results of daily conversations between people, such as “buying a car”. This recorded corpus, called “switchboard”, has been used to test speech recognition systems for more than 20 years. It is very rare to get an error rate of 5.5% through this database.
Earlier, Jeff Dean, a senior researcher at Google, said at the AI frontiers summit that Google had reduced the word error rate (WER) of speech recognition by more than 30% since 2012. Word error rate refers to the error rate when Google transcribes a word from speech into text.
Dean said the decline in word error rates was due to the use of neural networks, a system used by Google and other companies in deep learning. Researchers use a large amount of data to train neural networks, such as speech clips, and then ask them to infer new data. Google first used neural network in speech recognition in 2012, when the “jelly bean” Android system was released. Google doesn’t often discuss the company’s progress in speech recognition technology, which affects more and more Google products, from Google home smart speaker to gboard input method.