The sensing and computing devices we use are everywhere and will continue to increase. The devices are always working, so power becomes more and more important. The best examples are voice control devices that are on your desk, in your pocket, and around your home. Keyword capture is a goal of various neural mimicry techniques.
In 2020, the winner of Misha Mahowald prize in neural mimicry engineering is Professor Shih Chii Liu and his team, who have been working on low delay, low power sensors for speech detection. Dynamic audio sensors developed by Shi Chii Liu of the Institute of Neuroscience (INI) in Zurich and his team have expanded this market (Figure 1). The core of the sensor is a bio simulated silicon cochlea. First of all, it uses a set of analog band-pass filters to filter the input sound into each channel, and then half wave rectifies it at the output side. These circuits simulate the function of hair cells in the ear.
Figure 1: A. in a common audio system, sound is first converted into digital signal by an analog-to-digital converter, and then signal features are extracted by digital fast Fourier transform (FFT) and band-pass filter (BPF). These signals are processed by digital signal processor (DSP) running voice endpoint detection (VAD) or automatic speech recognition algorithm. B. In the dynamic audio sensor of Zurich ini, the received signal is analog audio frequency band, whose characteristics and changes are encoded into a series of asynchronous spike signals (events) in parallel, and then processed.
This process is similar to biology in preparing different channels for brain processing. In the human ear, ganglion cells encode signals into a series of chemical ions; In silicon cochlea, it is converted into electrical spike signal. This step can be accomplished by using the commonly used if neuron model or asynchronous incremental modulator (ADM): ADM compares the signal with two thresholds, and sends events when the signal exceeds the threshold, which is equivalent to the signal feature extractor. Because the invariable signal is ignored, the redundant information transmitted to the next level is reduced.
From the power point of view, when there is no activity, the silicon cochlea consumes almost no electricity, and with the increase of activity, the number of spike signals increases. This is a huge advantage for applications that listen all the time but rarely process; However, when the application always needs to decode the related content, there is no obvious power consumption advantage.
Because the power consumption of this kind of audio sensor is only a few microwatts, system designers can use it as a very useful choice in the design to improve the power efficiency. In addition, the audio sensor works in continuous time, and the distance between spikes can be close or far, so it also supports a very high dynamic range.
The key of this work is to prove the usefulness of silicon cochlea. Specifically, the event stream generated by silicon cochlea can be used in speech endpoint detection (the first stage of keyword recognition) and other practical applications. Liu and her team did it successfully. They used the event output to create 2D data frames, and the histogram of the arrival of the spike signal was arranged by frequency on a 5ms frame. This is cochlear like image, which can be read and decoded by neural network.
“The IEEE ISSCC community is very interested in using deep networks on sensors. Audio edge computing is emerging, and deep networks appear very timely.” Liu said, “there are many papers on keyword recognition using low-power application specific integrated circuits (ASICs), but these papers all use conventional functions similar to spectrogram. One of our goals is to demonstrate that the hybrid scheme (hybrid analog signal design) can provide lower power and reduce response delay. “
Last year, ini released a video showing the digital recognition system. The system is still in the early stage of development and is not absolutely reliable. Liu’s team also includes minhao Yang, Chang Gao, ENEA ceolini, Adrian Huber, jithendar anumula, Ilya kiselev and Daniel Neil. Over the years, they have also studied sensor fusion to combine audio and video information for more reliable classification. They have been following early design guidelines to decide when to choose analog sensors and when to use digital sensors.
Their other goal is to improve the power efficiency and performance of the data acquisition system (DAS), including using the source follower to realize the band-pass filter and designing the analog feature extractor.
Reducing the effect of analog electronic device variability is another important research area. To solve this problem, Liu and his team designed a hardware simulator. They say the simulator can test these problems faster than commercial software like cadence virtuoso. Using software rather than hardware to train binary neural network for classification can accurately predict the classification performance of a variety of test chips. Liu and his team are considering adding noise to the system to test its variability and make the design process more robust.
Liu is one of the early researchers of neural mimicry engineering. She worked in Carver Mead’s laboratory at Caltech and was one of the founders of the Zurich Institute of neuroinformatics. At that time, many members of the research team left California for Zurich. Mahowald (Fig. 2) also worked in the laboratory.
On receiving the award, Liu said: “we are honored to receive this award, especially with so many outstanding researchers in neural mimicry engineering. This work builds on decades of early silicon cochlear design and is a continuation of the work of Dick Lyon, Carver Mead, Lloyd watts, Rahul sarpeshkar, Eric vittoz and Andre van Schaik.
Talking about the importance of neural mimicry engineering, she said: “even if Moore’s law comes to an end, the energy efficiency of digital computing is still one thousandth that of biology. Therefore, the efficiency of hybrid analog electronic systems such as DAS is more important than ever. “
Editor in charge: PJ