Neural networks are now integrated into the image processing subsystems of almost all high-end smartphones. Voice recording and speech processing experts will argue that what we now call artificial intelligence has been operating at the edge for years.

However, in most cases, these applications utilize SoCs and DSPs that were not designed for modern AI workloads. With advances in AI technology and deployment, some new engineering challenges have emerged:

Need an always-on, ultra-low-power system that can run on battery power for extended periods of time and provide fast response times for inference

Integrated security requirements for protecting machine learning graphs from tampering or theft

The need for flexible solutions that can adapt to rapid changes in AI models and algorithms

These trends raise the stakes for IP and processor vendors looking to serve the embedded AI market, which is expected to be worth $4.6 billion by 2024. These companies are now offering highly integrated, purpose-built computing solutions to grab a share of the business.


As AI is deployed in devices as constrained as hearing aids, power consumption has become a top consideration for inference platforms. Eta Compute has incorporated patented Dynamic Voltage Frequency Scaling (DVFS) technology into its multicore SoCs to serve these use cases.

To save power, many traditional processors include a sleep function that wakes up the core when a load is present. However, most of these devices run cores at peak rates, which of course require extra power.

With DVFS, Eta Compute devices continuously switch the voltage supply based on the current workload, but only to the minimum power required to perform the task long enough. The company's ECM3531 is based on the Arm Cortex-M3 and NXP CoolFlux DSP, so it can provide 12-bit resolution of 200 kSps at 1 µW power consumption.

Dataset lock

On-chip training datasets referenced during inference operations have been found to be available. For most potentially stolen AI companies, these datasets represent extremely valuable intellectual property. But changing the pixels in an image recognition dataset can make the inference engine misidentify objects or fail to identify them at all.

A well-known example occurred when researchers tricked the Google AI into believing the rifle was a helicopter, but imagine if the self-driving car AI thought a pedestrian was a garbage bag? Worse still, human engineers trying to debug software often fail to detect pixel changes.

IP blocks such as Synopsys' DesignWare EV7x processors include a vision engine, DNN accelerators, and tightly coupled memory, delivering up to 35 TOPS of power-efficient performance. A low-profile feature of the EV7x processors, however, is the optional AES-XTS encryption engine, which helps protect data passing from on-chip memory to the vision engine or DNN accelerator.

Flexibility for future models

From DNNs to RNNs to LSTMs, dozens of neural network types have emerged over the past few years. While these symbolize exciting innovations in AI software and algorithms, they also raise significant questions about computing devices optimized for specific types of workloads.

ASICs can take from six months to two years from design to tape-out, which can accelerate the obsolescence of highly specialized solutions. It is for this reason that FPGAs have gained huge traction in AI engineering.

Xilinx devices such as the popular Zynq and MPSoC platforms are hardware and software reprogrammable. This means that logic blocks can be optimized for today's leading neural networks and then reconfigured months or years after the algorithm develops.

But a feature called Dynamic Function Exchange (DFX) allows the system to download partial bit files that can dynamically modify logical blocks. This can happen at device deployment and runtime, essentially adding, removing or changing the capabilities of a single Xilinx device.

Production-ready AI at the edge

Expectations for AI edge computing today are similar to what we predicted for IoT a few years ago. Just as trillions of "things" will be connected, we assume that the vast majority of them will be (artificially) intelligent.

While the previous generation solution laid the foundation, the next generation solution requires a new set of capabilities to ensure commercial success. Processor and IP vendors are responding by integrating more and more functions into AI edge computing devices.

Reviewing Editor: Guo Ting

Leave a Reply

Your email address will not be published.