Act asGPUStrong competitors in algorithm acceleration,FPGAWhether to support different hardware immediately is particularly important. The difference between FPGA and GPU lies in the flexibility of hardware configuration, and when FPGA runs the key subprograms (such as the calculation of sliding window) in deep learning, it can usually provide better performance than GPU under unit energy consumption. However, setting up FPGA requires knowledge of specific hardware, which many researchers and application scientists do not have. Because of this, FPGA is often regarded as an expert specific architecture. Recently, FPGA tools began to adopt software level programming model including OpenCL, which makes it more and more popular with users trained by mainstream software development.
For researchers who study a series of design tools, the criteria for selecting tools are usually related to whether they have user-friendly software development tools, whether they have flexible and scalable model design methods, and whether they can calculate quickly to reduce the training time of large models. With the emergence of high abstraction design tools, FPGA is more and more easy to write. Its reconfigurability makes it possible to customize the architecture. At the same time, its high parallel computing ability improves the speed of instruction execution. FPGA willDeep learningOf researchers.
For application scientists, although there are similar tool level choices, the focus of hardware selection is to maximize the performance of unit energy consumption, so as to reduce costs for large-scale operation. Therefore, with the powerful performance of unit energy consumption and the ability to customize the architecture for specific applications, FPGA can benefit application scientists with deep learning.
FPGA can meet the needs of two kinds of audiences, which is a logical choice. In this paper, we investigate the current situation of deep learning on FPGA and the technology development used to bridge the gap between them. Therefore, this paper has three important purposes. First of all, it points out that there is an opportunity to explore a new hardware acceleration platform in the field of deep learning, and FPGA is an ideal choice. Secondly, it outlines the current situation of FPGA supporting deep learning and points out the potential limitations. Finally, some key suggestions on the future direction of FPGA hardware acceleration are put forward to help solve the problems faced by deep learning in the future.
Traditionally, the trade-off between flexibility and performance has to be considered when evaluating the acceleration of a hardware platform. On the one hand, general purpose processor (GPP) can provide high flexibility and ease of use, but its performance is relatively inefficient. These platforms are often easier to access, can be produced at a low price, and are suitable for multiple uses and reuse. On the other hand, dedicatedIntegrated circuit(ASIC）Can provide high performance, but the cost is not flexible and more difficult to produce. These ones hereCircuitIt is dedicated to a specific application and is expensive and time-consuming to produce.
FPGA is a compromise between these two extremes. FPGA belongs to a more general programmable logic device (PLD), and is a reconfigurable integrated circuit. Therefore, FPGA can not only provide the performance advantage of integrated circuit, but also have the flexibility of GPP reconfiguration. FPGA can be easily usedtrigger(FF) to implement the sequential logic, and through the use of lookup table (LUT) to achieve the combination of logic. Modern FPGA also contains hardening components to achieve some common functions, such as full processor kernel, communication kernel, operation kernel and block memory (bRAM)
In addition, the current trend of FPGA tends to system on chip (SOC) design method, namelyARMCoprocessor and FPGA are usually on the same chip. Currently, the FPGA market is dominated by Xilinx, accounting for more than 85% of the market share. In addition, FPGA is rapidly replacing ASIC and application specific standard products (ASSP) to realize fixed functional logic FPGA market size is expected to reach $10 billion in 2016.
For in-depth learning, FPGA provides significant potential for better acceleration than traditional GPP. GPP execution at the software level depends on the traditional von Neumann architecture, instructions and datastorageExternallystorage, remove it when necessary. This has driven the emergence of caching, greatly reducing expensive external storage operations. The bottleneck of this architecture is the communication between processor and memory, which seriously weakens the performance of GPP, especially the storage information technology that deep learning often needs to acquire. In comparison, the FPGA programmable logic elements can be used to realize the data and control path in common logic functions without relying on the von Neumann structure.
Most importantly, compared with GPU, FPGA provides another perspective for the exploration of hardware accelerated design. The design of GPU and other fixed architectures follows the software execution model, and constructs a structure around autonomous computing units to execute tasks in parallel. Thus, the goal of developing GPU for deep learning technology is to make the algorithm adapt to this model, so that computing can be completed in parallel and ensure data interdependence. In contrast, the FPGA architecture is customized for the application. In the development of deep learning technology of FPGA, less emphasis is placed on making the algorithm adapt to a fixed computing structure, so as to leave more freedom to explore the optimization of algorithm level. It is difficult to implement the technology which needs many complex lower hardware control operations in the upper software language, but it is particularly attractive for FPGA implementation.