Today, for the hearing-impaired group, the inaudible sound can be touched. "Touch" the inaudible language is realized by the AI ​​system "Derma" jointly developed by the University of Tokyo and Sony Computer Science Institute (CSL). With the Derma system, simply sticking sensors on the skin around the throat uses the vibrations of the skin in the throat and jaw to translate mouth shapes into speech.

According to the severity of the disease, the current mainstream treatments for hearing impairment include: drug therapy: through intravenous drip or local drip (such as hormones, antibiotics, antiviral drugs, etc.) to subside inflammation and restore hearing as soon as possible; surgical treatment: mainly for Surgery for external and middle ear deformities, various diseases that compress the Eustachian tube, ear trauma, etc.; instrument assistance: such as hearing aids (the degree of hearing loss ≤ 80dB), cochlear implants (the degree of hearing loss > 80dB).

Among them, cochlear implantation is currently the only effective way to restore hearing in patients with severe and profound deafness. As early as 1957, French scientists first implanted electrodes into the cochlea of ​​a total deaf patient, allowing the patient to perceive ambient sounds. Until the 1990s, cochlear implants entered the stage of clinical application, bringing "new life" to patients with severe deafness.

In fact, the development of cochlear implants is inseparable from the development of electronic technology, computer technology, phonetics, electrophysiology, materials science, and ear microsurgery. Before the rise and development of these disciplines, for hearing-impaired patients, scientists responded with a palpation lip-reading method called Tadoma. As the name suggests, this therapy involves hearing-impaired patients reading what the speaker wants to say by touching the speaker's lips, chin, and neck with their fingers.

The inspiration for the above-mentioned Japanese team to develop the AI ​​system Derma originally came from Tadoma. Automating Tadoma with Machine Learning, the team's design is to automate the process of Tadoma therapy with machine learning.

In terms of its principle, as shown in the figure below, an acceleration/angular velocity sensor is attached to the skin around the throat to obtain the skin vibration information from the jaw to the throat caused by the movement of the jaw and tongue muscles during silent vocalization, and deep learning is used for analysis. Recognition, and finally realize the silent speech interaction (Silent Speech InteracTIon, SSI) that converts silent speech into speech input.

The sensor can obtain 12-dimensional skin motion information, and deep learning can analyze and identify 35 types of vocalizations. Experiments show that the accuracy of identifying skin tremor information exceeds 94%. It is worth mentioning that the research team used the connection time classification (ConnecTIonist Temporal ClassificaTIon, CTC) to train the model.

In fact, in the process of training a speech recognizer, it is difficult to align the input and output due to factors such as the speed of the speaker. To solve this problem, connection time classification comes in handy. In terms of form factor, the device is small, lightweight and unobtrusive compared to some existing silent voice interaction devices. In addition, this system has low power consumption, is not easily affected by factors such as environmental brightness, and will not affect the wearer's normal life, which can be said to be very practical.

In addition, the research team said that the converted speech synthesis can not only be input into a digital device with speech recognition function (voice assistant), but also can help patients with language barriers to communicate. In the future, the research direction of the team is the integration of wearable electronic devices and in vivo embedded computing.


Leave a Reply

Your email address will not be published. Required fields are marked *