A complete loudspeaker will consist of several parts: the speaker unit, the crossover network, and the speakers. We will discuss them in different categories. The first is the speaker unit. Basically, the working principle of the microphone is reversed, and the electrical signal is input to the coil on the voice coil in the magnetic system. The coil will produce magnetic changes with the signal, and drive the voice coil in the magnetic system. Waveform movement of sound. The voice coil then pushes the diaphragm or cone of the speaker unit to push the air to generate sound waves, and the sound is just like this.

It is really not difficult to say, but it is another matter to make the electrical signal sound as low as possible with the original waveform and response. The audio frequency range from low frequency (20Hz) to high frequency (18kHz) exceeds ten octaves. If a single speaker can cover this audio range, it will be limited by the structure in terms of volume. However, now the full-range monomer technology is mature and developed, and there are already many full-range monomers with good performance available for sale on the market.

Of course, in order to build a loudspeaker system that can emit large volume and high bandwidth, it is necessary to allocate different sound ranges to monomers with different characteristics, such as the low frequency range (below 300Hz) for the woofer, and the middle frequency range (300Hz-2500Hz) for the midrange. Monomer, high-frequency domain (above 2500Hz) pronounces separately for treble monomer, and integrates into a complete range. Because the low frequency needs to push a lot of air, it needs the largest diaphragm/sound cone; the middle range needs to push less air, so the diameter of the cone and the size of the monomer are also smaller and lighter; while the high range only needs to push the least Air, so the tweeter is also the lightest and smallest diaphragm.

Basically, the larger the diameter of the single cone/diaphragm, the heavier the mass and the more air it can push, but it also has greater inertia, so the reaction speed will be reduced, so it is suitable for lower frequencies; on the contrary, the single The smaller the diameter of the diaphragm and the lighter the mass, the faster the response speed and can emit higher frequencies, but the relative amount of air that can be pushed is limited. This is why speakers with a small size on the market are equipped with multi-channel and multiple monomers for integrated pronunciation.

Of course, if this is the case, the electrical signal of the amplifier must be separated into the high-bass path or even the mid-range path, which is the so-called “frequency division”. Generally speaking, there are two ways to divide the frequency of the speaker system. The most mainstream way is to use the passive frequency division network to divide the signal of the amplifier into sound channels with different frequency ranges. To put it bluntly, the passive crossover network is a “filter” composed of passive inductors, capacitors, and resistors, which filter out the frequency bands outside the sound range of the audio circuit, and only the required frequency bands can pass through. Therefore, there are several sound channels in the speaker system, and there will be several sets of filter networks to form a frequency division network to drive the monomers responsible for different sound ranges.

Another method is “electronic crossover”, which is to send the signal to the electronic crossover when it is still output from the pre-stage, and divide the required frequency band into each range, but it uses an active electronic crossover circuit. , generally speaking, the frequency division effect will be better than the passive frequency division network. However, the different audio channels from the frequency division require individual amplifiers to drive the monomers of each audio channel, which will greatly increase the cost of the speaker system; usually, electronic sound division is used by relatively large speaker systems ( There are also professional listening speakers that will be introduced in another article).

In the end, of course, these monomers with different sound paths must be installed to form a complete set of speaker systems, but further consideration is required. The reciprocating vibration of the monomers pushes the air to produce sound, and the front and rear sounds are “inverted”. If there is no further processing, the effect of canceling each other will be produced in the listening space, so “boxing” is required to further process the “back wave” emitted from the rear of the monomer. Generally speaking, each monomer will have an independent space to deal with the back wave. If the volume of the mid-range and high-pitched monomer is small, a sealed back cavity will be built for pre-treatment when the monomer leaves the factory.

Therefore, the speaker box is mainly designed for certain midrange and woofer units with larger calibers. At present, there are two mainstream ways of loudspeaker cabinet design: closed type and open type. The mainstream of open type is bass reflex type, that is, the bass chamber capacity of the speaker box and the diameter and length of the reflection duct are calculated, and the low frequency characteristics of the monomer are calculated. Tuned to produce a greater amount (just the right amount) of low frequency presence. However, the volume of the closed speaker still has to be calculated by considering the characteristics of the monomer, so that the low frequency can be extended to the lowest frequency.

However, open speakers do not only have bass reflex design methods, but also have many methods such as the Isobarik form of dual monomers and multiple air chambers or transmission lines (the form of separating the inside of the speaker into a long tube to extend the low-frequency frequency). There are also many designs on the material and structure of the speaker to strengthen its structure to avoid resonance and affect the sound quality. The most mainstream material is the so-called “medium density fiberboard” (MDF). This material is cheap, easy to process and has many ideal effects. characteristic. Of course, some speaker manufacturers use metal or special materials to design/construct speakers to achieve better characteristics and effects.

The above are the components of a typical loudspeaker. Of course, there are other technically different designs that deviate from the above categories. For example, the “plasma/ion tweeter” uses discharge to drive the air; the “electrostatic speaker” uses electrode/electric field drive The membrane is used to push the air to speak, and there is no speaker structure at all. There are indeed many other ways to convert electrical energy into sound energy, but the most mature and mainstream approach is still the traditional speaker based on the electromagnetic system and integrated with the speaker structure.

01 Physical neural network

Seeing an article Deep Physical Neural Networks Trained with Backpropogation [1] recently published in Nature magazine, it introduces the use of multi-layer nonlinear physical systems to construct deep learning networks, and completes the system training method through reverse stochastic gradient descent. Surprised and ruined.

Do you dare to imagine using a few speakers or a few field effect tubes to form a deep physical neural network (Physical Neural Networks) to complete image classification? The classification effect is not inferior to the traditional digital neural network. For the recognition of MNIST handwritten digits, it can also reach more than 97%. (see below based on four-channel dual harmonic signal generator (SHG) scheme)

▲ Figure 1 The P-physical neural network constructed based on the mechanical system, electronic circuit, and optical system respectively

The goal of this kind of neural network built on physical systems rather than digital processors is to surpass traditional digital computers in terms of inference speed and energy efficiency, build smart sensors and efficient network reasoning.

Guess that most people, like me, will have doubts when they first read this article: How can such common speakers, triodes, and optical lenses be able to complete learning training and reasoning like a deep learning network? In particular, these are some common physical systems, which do not contain any structures such as quantum computers and neural computers.

The article contains a lot of work (the original PDF has more than 60 pages), and I haven’t read it yet, but at the beginning of the article, the principle of why the physical neural network can realize the artificial neural network algorithm is relatively clear. Traditional deep learning can decompose the cascading calculation of several network layers. The calculation of each layer includes input data (Input) and network parameters (Parameters). After fusion, they form the output (Output) of the network through the nonlinear transfer function of neurons. .

▲ Figure 2 The connection between artificial neural network (ANN) and physical neural network (PNN)

The physical neural network is also a cascade divided into several layers, such as several speakers, and each speaker is a layer of neural network. The input signal is the input voltage of the speaker; the network parameter is a set of voltage signals that can be controlled, such as the duration, and the signal whose amplitude can be changed. They are combined with the input signal (superposition, series connection, etc.) and then sent to the speaker. The output sound is collected by the microphone to form the output of the network.

▲ Figure 1.3 Structure diagram of a layer of neural network composed of speakers

In the system composed of amplifier circuits composed of transistors and optical frequency multipliers (SHG), for input signals, network parameters and their fusion methods are different according to the characteristics of each subsystem.

For example, in the figure below, the network parameter is actually a DC signal with different length and amplitude, which is embedded in the input signal (A), and the output (B) is formed after passing through the triode circuit. The input signal and the network parameter fusion part are expanded. with normalization (C) to form the network output signal.

▲ Figure 3 The series connection of the input signal network parameter signal (amplitude controllable DC level) in the triode circuit, and the corresponding circuit output signal

Although the details of how the network is trained and how it works are still to be further understood, the nature of the deep neural network algorithm shown in the article is refreshing. The nonlinearity between the input and output of the system is used to fuse the input signal with the network signal to complete the information processing, so the three systems (speaker, triode circuit, and secondary frequency doubling optical system) in the article should not be linear. change system.

Next, let’s put aside the physical neural network algorithm and first look at the characteristics of the three systems in the paper.

02 Nonlinear system

The principles and methods discussed in “Signals and Systems” and “Automatic Control Theory” learned at the undergraduate stage are basically aimed at linear time-invariant systems, so judging whether a system is linear time-invariant is the application of these What needs to be done in the first step of theory.

Do the three physical systems (mechanical, electronic, and optical) mentioned in the previous Nature paper satisfy linear time invariance?

2.1 Transistor circuit

The triode circuit in the article is the simplest, and its non-linearity is also the most obvious.

The circuit contains four components: resistors, inductors, capacitors, and field effect transistors. Among them, resistors, inductors, and capacitors are all linear components, and only field effect transistors are nonlinear devices. It has a quadratic relationship between drain saturation current and gate voltage. So the electronic system is a nonlinear system.

▲ Figure 2.1.1 Transistor circuit

2.2 Second harmonic generation system (SHG)

The second harmonic generation system is an optical system, and it is also the most complex system in the examples in the article.

I am not very familiar with the SHG (Second-Harmonic Generation) optical system, and I can understand its basic principle by searching the corresponding literature [2]. It utilizes some special molecular physical states to double the frequency of the input optical signal to generate a corresponding second harmonic signal.

▲ Figure 2.2.1 Second harmonic generation system

For this kind of physical system that you are not familiar with, how to judge whether it is a linear time-invariant system?

Here we need to take advantage of a property of linear time-invariant systems: linear time-invariant systems do not generate new frequency signals.

Although it can change the amplitude and phase of different frequency components in the input signal, no new frequency components will be generated. The SHG optical system doubles all the frequency components in the input spectrum to generate new double frequency components, so it does not belong to the linear time-invariant system.

Therefore, frequency doubling is the key for the system to be used to complete the physical neural system, and a linear time-invariant optical system cannot construct a physical neural network.

2.3 speakers

Among the three systems exemplified in the article, the mechanical vibration system of the loudspeaker is the most confusing. The system is divided into speakers, audio amplifiers, and microphones. The speaker needs to be modified.

They removed the diaphragm and dust cover of the moving coil speaker to expose the audio coil, glued a metal screw on it, and fixed a 3.2cm×3.2cm square, 1mm thick metal sheet made of tantalum. After reading this, you will feel that their coquettish operation is just taking off their pants and farting, making a fool of themselves.

▲ Figure 2.3.1 A mechanical oscillation system made of loudspeakers

I thought that they wanted to incorporate nonlinear links into the speaker mechanical system, but the metal screws and tantalum sheets added to the sound coil seem to only increase the inertial mass of the speaker coil. Suppression, which acts as a low-frequency filter. Therefore, the system is still a linear time-invariant system.

The following is the speaker input voltage signal, microphone recording signal and digital signal of signal downsampling given in the supplementary material of the paper. It can be seen that the audio signal recorded by the microphone is indeed a low-pass smoothing filter of the input signal.

▲ Figure 2.3.2 Speaker input signal, microphone recording signal and downsampled digital signal

The figure below shows the audio signal collected by the microphone after the amplitude-controllable DC signal (equivalent to network parameters) is embedded in the input random signal given in the article and applied to the speaker. In the last picture, it can be seen that there is a linear relationship between the corresponding output signal and the input signal at different times.

▲ Figure 2.3.3 Input random noise plus output signal of controllable DC signal segment noise

So the question is: where is the nonlinear link in this system?

What can be thought of now is that the downsampling of the microphone signal may change the linear time-invariant characteristics of the system, similar to the role of the Pooling layer in the convolutional neural network.

※ Summarize※

This research paper from Cornell University presents a framework for using physical systems to enable deep network learning and inference. In this paper, the three systems exemplified in the article are not linear time-invariant systems. Except that the SHG system is relatively complicated, the other two systems (transistor and speaker) are so simple and attractive, which attracts people to build the system and test the corresponding performance.


[1] Deep Physical Neural Networks Trained with Backpropogation: https://www.nature.com/articles/s41586-021-04223-6.pdf [2] Search corresponding literature: https://www.sciencedirect.com/topics /chemistry/second-harmonic-generation

Leave a Reply

Your email address will not be published.