Foreign media, since Amazon released the smart speaker echo in 2014, millions of “smart speaker” devices such as Amazon echo, Google home and apple homepod have been sold. Large technology companies are integrating similar services of Amazon Alexa, apple Siri, Google assistant, Microsoft Cortana and Facebook into people’s lives. Juniper research, a consulting firm, estimates that by 2023, the annual market scale of global smart speakers will reach US $11 billion, and there will be about 7.4 billion voice control devices in the world.

Technology companies say smart speakers record only when users activate them, but in fact they are introducing permanently open microphones into private space. Amazon and its competitors say that the vast majority of voice requests are automatically executed by computers without manual review. However, these intelligent devices are relying on thousands of low paid employees for manual transcription, and user private dialogue has become one of their most valuable data sets. All technology companies believe that this is a reasonable way to improve their products.

Actually, we’ve been eavesdropping

Ruthy hope slatis couldn’t believe what she heard. She was once employed by a temporary agency outside Boston to transcribe audio files for Amazon, a job blurred by Amazon. As a contract worker paid only $12 an hour, she and her colleagues (officially called data assistants) need to listen to random conversations and write down every word they hear on their laptop.

Amazon will only say that this work is crucial to its top secret speech recognition products. However, these speech clips contain recordings of users’ intimate moments.

In the fall of 2014, Amazon launched the echo smart speaker with voice activated virtual assistant software Alexa. Amazon saw Alexa as a miracle of artificial intelligence in its first echo ad. In the advertisement, a happy family is ordering Alexa to receive news updates, answer questions and answers, and help children with their homework. However, slatis soon began to realize the human influence behind this product.

She remembers thinking, “God, that’s what I’m doing.” Amazon will capture every voice command in the cloud and rely on a data assistant like her to train the system. At first, slatis thought that the clips he listened to came from paid testers who volunteered to exchange their voice mode for a few dollars. However, she soon realized that the idea was wrong.

The recording she and her colleagues are listening to is usually intense and embarrassing, and users will admit their secrets and fears in front of the speaker. With the development of transcription project and the popularity of Alexa, the private information disclosed in the recording also increases. Other contract workers recalled that they had heard children share their home address and phone number, heard a man trying to order sex toys, and even heard a dinner guest loudly wondering if Amazon was eavesdropping. “Users are often just joking, but they don’t know they are being eavesdropped,” said slatis, who resigned in 2016.

Technology companies say they are correcting

Lei Fengwang learned that in the five years since slatis first felt creepy, a quarter of Americans have purchased “smart speaker” devices, such as echo, Google home and apple homepod. So far, Amazon has won the sales war. It is reported that users have purchased more than 100 million Alexa devices.

But now, the world’s largest companies are waging a new war, embedding Alexa, Siri, Google assistant and Cortana into people’s lives by embedding microphones in mobile phones, smart watches, televisions, refrigerators, SUVs and other items. Juniper research, a consulting firm, estimates that by 2023, the annual market scale of global smart speakers will reach US $11 billion, and the number of voice control devices will reach about 7.4 billion, which is equivalent to everyone on earth has a device.

The question now is, how do we deal with such a scale?

According to the technology company’s statement, these machines do not create audio files all the time because smart speakers record audio only when users activate them. However, when always on microphones are introduced into the kitchen and bedroom, they may inadvertently capture sounds that users do not want to share.

However, these so-called smart devices undoubtedly need to rely on thousands of low-income people. They need to add notes to these sound clips so that technology companies can upgrade their “electronic ears”. So far, our weakest whisper has become one of the most valuable data sets of technology companies.

Earlier this year, Bloomberg first reported that the technology industry used humans to review audio collected from users (and did not disclose this fact to users). This includes apple, Amazon and Facebook. Relevant executives and engineers said that building a huge human monitoring network will bring problems or interference, although this has always been an obvious way to improve their products.

In addition, Lei Feng network (official account: Lei Feng net) also noted that in the past few years, Apple has become more radical in gathering and analyzing people’s voice, and is concerned that Siri’s understanding and speed are lagging behind Alexa and Google Assistant. Apple regards Siri as a voice search engine, so it must be prepared to cope with endless user queries and rely more on audio analysis.

In 2015, when Apple CEO Tim Cook declared that privacy was a “basic human right”, Apple’s machines had to process more than a billion requests a week. At that time, users can turn on a function to keep the voice assistant online, so that they no longer need to press the button to activate the voice assistant. Apple said in the legal terms of its user agreement that it might record and analyze voice data to improve Siri, but there was no mention that it would be human employees listening. A former contract worker said, “listening to other people’s voice makes me feel very uncomfortable. John burkey, who once worked in Siri’s senior development team, said.” this is not espionage. It’s the same as when the application crashes and asks whether to send the report to apple. “

Many contract workers said that although most Siri requirements are very common, they still hear pornographic voice and racist or homophobic speech.

Apple said that less than 0.2% of Siri requests require human analysis. The former manager regarded the Contractor’s allegations as exaggerated. Tom Gruber, co-founder of Siri who once led the development team, said: “in fact, many of what we have to deal with are noise. It doesn’t mean that the machine plans to record some sounds. It’s just a matter of probability in a sense.”

By 2019, Apple will need to process 15 billion voice commands per month after introducing Siri into its products such as wireless headphones and homepod speakers. 0.2% means that human contract workers need to process 30 million voice commands every month, which will be 360 million in a year. Mike Bastian, the former chief research scientist of Siri team, said that the risk of random recording is increasing with the increase of use cases. He mentioned the “lift activation” function of Apple watch, which will automatically activate Siri when it detects that the wearer’s wrist is lifted. “This leads to a high false positive rate,” he said

In 2016, Amazon created the frequent utterance database (FUD) to help Alexa add answers to common requests. Former employees working with FUD said there was tension between the product team eager to more actively mine data and the security team responsible for protecting user information. In 2017, Amazon launched echo look with camera, which is called AI stylist and can recommend clothing matching. People familiar with the matter said that its developers were considering programming the camera to turn on automatically when users asked Alexa to tell jokes. Their idea is to record a video of the user’s face and evaluate whether the user is laughing. These people say Amazon finally shelved the idea. The company said Alexa currently does not use face recognition technology.

The company has set up “farms” all over the world. This year, it held several introductory recruitment activities for overseas transcriptors. A speech technology expert who has spent decades developing recognition systems for technology companies said that the recent recruitment scale implied that the scale of Amazon’s audio data analysis was shocking. Amazon said it “takes customers and the security of their recordings seriously” and needs a comprehensive understanding of regional accents and colloquialism to make Alexa global.

Microsoft admitted in August that it used human help to review voice data generated through voice recognition technology. Companies such as BMW, HP and Humana are integrating this technology into their products and services. Chinese technology companies, including Alibaba, search giant Baidu and mobile phone manufacturer Xiaomi, collect voice data from millions of smart speakers every quarter.

Google search provides Google assistant with queries from billions of available devices, including Android smartphones and tablets, nest thermostat and Sony TV. Google has hired temporary workers overseas to transcribe fragments to improve the accuracy of the system. Google has promised that the reviewed recordings will not be associated with any personal information. But this summer, a Google contractor shared more than 1000 user records with the Belgian Broadcasting Company VRT NWs. The media was able to find out who some people in the recording were according to what users said, which shocked the identified users. 10% of these records are because the device incorrectly detected the activation word and recorded it without the user’s consent.

After the continuous emergence of relevant news reports, these large technology companies adjusted their virtual assistant projects this year.

Google suspended human transcription of assistant audio. Apple began to allow users to delete their Siri history and choose not to share more content, making shared recording optional, and directly hired many former contractors to enhance their control over human monitoring.

Facebook and Microsoft have added more explicit disclaimers to their privacy policies.

Amazon also introduced a similar disclosure method and began to allow Alexa users to choose not to conduct manual review.

Some researchers say that the improvement of smartphone processing power and a form of computer modeling called joint learning may eventually eliminate these monitoring behaviors, because these machines will become smart enough to solve problems without the help of contract workers. At present, due to the absence of stricter laws or strong opposition from consumers, with the proliferation of voice devices, the human audio audit team will almost certainly continue to grow.

Leave a Reply

Your email address will not be published. Required fields are marked *