While the acceleration of AI and ML applications is still a relatively new field, a variety of processors have sprung up to accelerate nearly any neural network workload. From the processor giants to some of the newest startups in the industry, they all offer something different—whether it’s for a different vertical market, application area, power budget, or price point. Here’s a snapshot of what’s on the market today.
The application processor Intel Movidius Myriad X was developed by Irish startup Movidius and acquired by Intel in 2016. Myriad X is the company’s third-generation visual processing unit and the first product equipped with a dedicated neural network computing engine. Available in 1 tera operations per second (TOPS) of dedicated deep neural network (DNN) computation. The Neural Compute Engine interfaces directly with high-throughput smart memory structures to avoid any memory bottlenecks when transferring data. It supports FP16 and INT8 calculations. Myriad X also features a set of 16 proprietary SHAVE cores and upgraded and expanded vision accelerators.
The Myriad X is available in Intel’s Neural Compute Stick 2, which is essentially an evaluation platform in the form of a USB thumb drive. It plugs into any workstation to get AI and computer vision applications up and running quickly on dedicated Movidius hardware.
NXP Semiconductors i.MX 8M Plus i.MX 8M Plus is a heterogeneous application processor that uses VeriSilicon (Vivante VIP8000) dedicated neural network accelerator IP. It provides 2.3 TOPS of acceleration for inferencing on endpoint devices in the consumer and Industrial Internet of Things (IIoT), enough for multi-object recognition, 40,000-word speech recognition, and even medical imaging (MobileNet v1, 500 images per second ).
In addition to the neural network processor, the i.MX 8M Plus features a quad-core Arm Cortex-A53 subsystem running at 2 GHz, and a Cortex-M7 real-time subsystem.
For vision applications, two image signal processors support two high-definition cameras for stereo vision or one 12-megapixel (MP) camera. For voice, the device includes an 800-MHz HiFi4 audio digital signal processor (DSP) for pre- and post-processing of the voice data.
NXP’s i.MX 8M Plus is the company’s first applications processor with a dedicated neural network accelerator. It is specially designed for IoT applications.
XMOS xcore.ai xcore.ai is designed to enable voice control in Internet of Things (AIoT) applications. The device is a crossover processor (with the performance of an application processor and the low-power real-time operation of a microcontroller) designed for machine learning inference on speech signals.
It is based on XMOS’ proprietary Xcore architecture, which itself is built on building blocks called logical cores, which can be used for I/O, DSP, control functions or AI acceleration. There are 16 of these cores on each xcore.ai chip, and designers can choose how many to allocate to each function. Mapping different functions to logical cores in firmware allows the creation of “virtual SoCs” written entirely in software. XMOS adds vector pipeline capabilities to Xcore for machine learning workloads.
xcore.ai supports 32-bit, 16-bit, 8-bit, and 1-bit (binarized) networks, delivering 3,200 MIPS, 51.2 GMACC, and 1,600 MFLOPS. It has 1 MB of embedded SRAM and a low-power DDR interface for expansion.
XMOS’ xcore.ai is based on a proprietary architecture designed for AI workloads in speech processing applications.
Automotive SoC Texas Instruments Inc. TDA4VM Part of the Jacinto 7 family for automotive advanced driver assistance systems (ADAS), the TDA4VM is TI’s first system-on-chip (SoC) with a dedicated on-chip deep learning accelerator. The module is based on a C7x DSP plus an in-house developed Matrix Multiplication Accelerator (MMA), enabling 8 TOPS.
The SoC can handle video streams from up to 8 MP from the front camera or a combination of four to six 3 MP cameras plus radar, lidar and ultrasonic sensors. For example, MMA can be used to perform sensor fusion on these inputs in an automated valet parking system. The TDA4VM is designed for ADAS systems between 5 and 20 W.
The device is still in pre-production, but development kits are available now.
TI’s TDA4VM is suitable for complex automotive ADAS systems that allow vehicles to sense their environment.
GPU Nvidia Corp. Jetson Nano Nvidia’s famous Jetson Nano is a small but powerful graphics processing unit (GPU) module for AI applications in endpoint devices. The GPU on the Nano module is built on the same Maxwell architecture as the larger members of the Jetson family (AGX Xavier and TX2), has 128 cores, and is capable of 0.5 TFLOPS, enough to run multiple Neural Networks. Resolution image sensor, according to the company. It consumes only 5 W while in use. The module also has a quad-core Arm Cortex-A57 CPU.
Like other parts in the Nvidia range, the Jetson Nano uses CUDA X, Nvidia’s collection of neural network acceleration libraries. Inexpensive Jetson Nano development kits are readily available.
Nvidia’s Jetson Nano module contains a powerful GPU with 128 cores for AI at the edge.
Consumer coprocessor Kneron Inc. KL520 US-Taiwan startup Kneron’s first product is the KL520 neural network processor, designed for image processing and facial recognition in applications such as smart homes, security systems and mobile devices. It is optimized for running Convolutional Neural Networks (CNNs), the type commonly used in image processing today.
The KL520 can run 0.3 TOPS and consume 0.5 W (equivalent to 0.6 TOPS/W), which the company says is enough for accurate facial recognition given the chip’s high MAC efficiency (over 90%). The chip architecture is reconfigurable and can be customized for different CNN models. The company’s Complementary Compiler also uses compression techniques to help run larger models within chip resources, helping to save power and cost.
The KL520 is available now and can also be found on an accelerator card from the manufacturer AAEON (M2AI-2280-520).
Kneron’s KL520 uses a reconfigurable architecture and clever compression to run image processing in mobile and consumer devices.
Gyrfalcon Lightspeeur 5801 Designed specifically for the consumer electronics market, Gyrfalcon’s Lightspeeur 5801 delivers 2.8 TOPS at 224 mW power consumption (equivalent to 12.6 TOPS/W) with a latency of 4 ms. The company uses a memory-processor technology that is particularly power-efficient compared to other architectures. Power consumption can actually be traded off with clock speed by varying the clock speed between 50 and 200 MHz. The Lightspeeur 5801 contains 10 MB of memory, so the entire model can fit on the chip.
This part is the company’s fourth production chip, which is already found in LG’s Q70 mid-range smartphone, and handles extrapolation of camera effects. The USB Thumb Drive Development Kit 5801 Plai Plug is available now.
Ultra-Low-Power Eta Compute ECM3532 Eta Compute’s first production product, the ECM3532, is designed for AI acceleration in IoT battery-powered or energy-harvesting designs. Always-on applications in image processing and sensor fusion can be achieved with power budgets as low as 100 µW.
The chip has two cores – an Arm Cortex-M3 microcontroller core and an NXP CoolFlux DSP. The company uses a proprietary voltage and frequency scaling technique that adjusts every clock cycle to squeeze every last drop of power out of both cores. Machine learning workloads can be handled by either core (for example, some speech workloads are better suited to DSPs).
Samples of the ECM3532 are available now, with mass production expected in the second quarter of 2020.
SynTIant Corp. NDP100 U.S. startup SynTIant’s NDP100 processor is designed for machine-learning inference on voice commands in power-stressed applications. Its in-memory processor-based chip consumes less than 140 µW of active power and can run models for keyword spotting, wake word detection, speaker recognition, or event classification. The company says the product will be used to enable hands-free operation of consumer devices such as earbuds, hearing aids, smart watches and remote controls. Development kits are available now.
Syntiant’s NDP100 devices are designed for voice processing in ultra-low power applications.
GreenWaves Technologies GAP9 GAP9, the first ultra-low-power application processor from French startup GreenWaves, features a powerful computing cluster of nine RISC-V cores with an instruction set heavily customized to optimize power consumption. It features a bi-directional multi-channel audio interface and 1.6 MB of internal RAM.
GAP9 can handle neural network workloads for image, sound, and vibration sensing in battery-operated IoT devices. Data from GreenWaves shows that GAP9 runs MobileNet V1 on a 160 × 160 image with a channel scaling of 0.25 in 12 ms and a power consumption of 806 μW/frame/s.