As an online encyclopedia that anyone can edit, Wikipedia needs a large number of volunteer editors to spend a lot of time and energy to keep every entry up-to-date. Although there are many volunteer editors, it is still a challenging task to ensure that thousands of pages are updated in time every day.
Not long ago, MIT researchers launched a new AI system that can be used to automatically update any inaccuracies in online encyclopedias to help human editors.
Darsh Shah, a PhD student in computer science and AI experiments at MIT, said, “Wikipedia articles need to be constantly updated, so it takes hundreds of people to modify each article, and AI can automatically complete the modification, which greatly improves efficiency.”
Researchers have proposed a text system that can accurately locate and replace specific information in relevant Wikipedia sentences, while using language similar to human writing and editing.
When people input unstructured sentences with updated information in the interface, AI will search Wikipedia for correct pages and outdated information, and then present the content in a human like language style.
There have been many other robots that can edit Wikipedia automatically before, but Shah said, “these tools are more rule-based, putting some narrow information into predefined templates. However, the task of editing is more to infer the contradictory parts of two sentences, and then generate coherent text sentences. The researchers’ model solves this problem by inputting a piece of unstructured information, and the model automatically modifies the sentence in a humanized way
AI identifying contradictory information
It is a very easy task for human to identify the contradictory information between two separate sentences and fuse them together, but it is a novel task for machine learning.
For example, the original sentence: “fund a believes that 28 of the 42 minority interests in active operating companies are particularly important to the group”, while the latest message is: “fund a believes that 23 of the 43 minority interests are of great significance”.
Based on these two sentences, the system will first find the relevant Wikipedia text about “fund a”, but automatically remove the outdated numbers 28 and 42 and replace them with the new numbers 23 and 43.
Update, pick up mistakes, identify fake news, MIT launched Wikipedia AI editing system
Generally speaking, the system trains on popular data sets containing sentence pairs, one of which is a statement and the other is a related Wikipedia sentence. Each pair is marked in three ways: agree, which means sentence matching; Disagreement means contradictory information; Neutral, indicating that there is not enough information available for any tag.
The goal of the system is to modify all outdated sentences to meet the corresponding requirements, and to make all inconsistent sentence pairs “agree”. Therefore, a separate model is needed to generate the required output.
The model is a fact checking classifier, which marks each sentence pair as “agree”, “disagree” or “neutral” in advance, focusing on the “disagree” sentence pairs. Running with the classifier is a custom “neutral shield” module, which can identify which words in outdated sentences contradict those in the declaration. It creates a binary “Mask” on an obsolete sentence, where 0 is placed on the most likely word to be deleted and 1 on the reserved word.
After masking, two encoder decoder frameworks are used to fuse and fill the obsolete sentences with different information.
Compared with other traditional text generation methods, this model is more accurate in updating fact information, and its output is more similar to human writing. In one test, researchers rated the model according to the degree to which the output sentences contained fact updates and matched human Grammar (from 1 to 5). The average fact update score of the model was 4, and the grammar matching score was 3.85, which was higher than all other traditional methods.
Researchers hope that AI can automatically complete the whole process in the future, which means that it can search for the latest news on a related topic on the Internet, replace the text, and automatically update the outdated information on Wikipedia.
Expand data set and eliminate errors
The research also shows that the system can be used to enhance the data set to eliminate bias when training the detector of “false news”.
“False news” is a way of propaganda containing false information, which aims to attract people’s attention, mislead readers or guide public opinion. These partial detectors are trained on the data set of agree disagree pairs to match the given evidence to verify the true and false news. In these sentence pairs, the statement can compare some information with the supporting “evidence” on Wikipedia. The model is trained to mark the sentence as “false” by refuting the evidence, so as to help identify false news.
But data sets often have unexpected deviations. Shah said, “in the training process, the model marks some languages as false examples according to the requirements of human written language, and does not have to rely too much on the corresponding evidential statements. This reduces the accuracy of the model in evaluating actual examples because it does not perform fact checking. “
Therefore, researchers use the same deletion and fusion techniques to balance the disagreement pairs in the dataset and help reduce bias. In some disagreement pairs, they use false information in the modified sentences to regenerate forged “evidence” supporting sentences. Some short sentences also exist in both “agree” and “disagree” sentences, This will enable the model to analyze more features and get an expanded data set.
Using this method, the researchers reduced the error rate of a popular false news detector by 13%.
Wikipedia deploy AI editor
As early as 2015, Wikipedia built an artificial intelligence engine to automatically analyze changes in Wikipedia.
Since anyone can edit Wikipedia, then anyone can add false information by mistake and destroy the site, so the earliest Wikipedia established a strict screening system, which prevented many people from joining the ranks of Wikipedia editors.
Halfaker, a senior research scientist at Wikipedia, has built his own AI engine to identify this destructive behavior and increase novice participation in a more friendly way. At the same time, he admitted, “this service can’t capture all the damage, but it can capture the most damage.”
Halfaker’s project is actually to increase people’s participation in Wikipedia. Today, five years later, the emergence of a new text system can automatically update the information of Wikipedia, greatly reducing the work of volunteer editors, and editors are also moving towards the direction of being eliminated.
Machines are becoming more and more intelligent, and it is more and more common for machine automation to replace human work. Whether human will be replaced by machines is also a hot topic at present. Some predict that AI and robotics will replace as many as 47% of our jobs in the next 20 years, but others believe that AI will create a lot of new jobs.