Speaking of artificial intelligence, it seems that we think of machine vision, fingerprint recognition, face recognition, retinal recognition, iris recognition, palmprint recognition, automatic planning, intelligent search, game, automatic programming, intelligent control, etc; However, its core content is inseparable from embedded.
AI on mobile devices no longer relies on cloud connectivity, as demonstrated by CES’s most popular product demo this year and its recently announced flagship smartphone. Artificial intelligence has entered the terminal equipment, and quickly become a market selling point. These factors, including security, privacy and response time, make the trend continue to expand to more terminal devices. In order to meet the demand, almost every player in the chip industry has launched different versions and different names of AI processors, such as “deep learning engine”, “neural processor”, “AI engine” and so on.
However, not all AI processors are the same. The reality is that many so-called AI engines are traditional embedded processors plus a vector processing unit. There are also some other functions that are very important for the front-end of AI processing.
Optimize the workload of embedded system
In the process of cloud computing, floating-point computing is used for training, and fixed-point computing is used for reasoning, so as to achieve the maximum accuracy. The energy consumption and size of data processing with large server group must be considered, but they are almost unlimited compared with the processing with edge constraints.
In mobile devices, the feasibility design of power consumption, performance and area (PPA) is very important. Therefore, more efficient fixed-point computing is preferred in embedded SOC.
When the network is converted from floating point to fixed point, it will inevitably lose some precision. However, the correct design can optimize the accuracy loss and achieve almost the same results as the original training network.
One way to control the precision is to choose between 8-bit and 16 bit integer precision. Although 8-bit accuracy can save bandwidth and computing resources, many commercial neural networks still need 16 bit accuracy to ensure accuracy.
Each layer of neural network has different constraints and redundancy, so it is very important to choose a higher precision for each layer.
For developers and SOC designers, a tool that can automatically output optimized graphics compilers and executable files, such as CEVA network generator, is a huge advantage from the time to market point of view.
In addition, it is important to maintain the flexibility of selecting higher precision (8-bit or 16 bit) for each layer. This enables each layer to trade-off between optimization accuracy and performance, and then generate efficient and accurate embedded network reasoning with one click.
Dedicated hardware to handle real AI algorithms
Vpu is flexible to use, but many common neural networks require a large number of bandwidth channels, which challenges the standard processor instruction set. Therefore, there must be special hardware to deal with these complex calculations.
For example, neupro AI processor includes special engine to process matrix multiplication, full connection layer, activation layer and aggregation layer. This advanced dedicated AI engine, combined with the fully programmable neupro Vpu, can support all other layer types and neural network topologies.
Direct connections between these modules allow data to be exchanged seamlessly, eliminating the need to write to memory. In addition, the optimized DDR bandwidth and advanced DMA controller adopt dynamic pipeline processing, which can further improve the speed and reduce the power consumption.
Tomorrow’s unknown AI algorithm
Artificial intelligence is still an emerging and rapidly developing field. The application scenarios of neural network are increasing rapidly, such as target recognition, speech and sound analysis, 5g communication and so on. Maintaining an adaptive solution to meet future trends is the only way to ensure the success of chip design.
Therefore, it is certainly not enough to meet the existing algorithm of dedicated hardware, but also must be equipped with a fully programmable platform. With the continuous improvement of the algorithm, computer simulation is the key tool for decision-making based on the actual results, and reduces the time to market.
Cdnn PC simulation package allows SOC designers to use PC environment to weigh their designs before developing real hardware.
Another valuable feature to meet future needs is scalability. Neupro AI product family can be applied to a wide range of target markets, from lightweight Internet of things and wearable devices (2 tops) to high-performance industry monitoring and automatic driving applications (12.5 tops).
The competition to implement the flagship AI processor on the mobile end has begun. Many people quickly catch up with this trend and use artificial intelligence as the selling point of their products, but not all products have the same level of intelligence.
If you want to create an intelligent device that remains “smart” in the growing field of artificial intelligence, you should make sure to check all the features mentioned above when choosing an AI processor.
Source: China electronic network