Inspur announced the open source release of an efficient AI computing framework based on FPGA TF2. The reasoning engine of this framework adopts the world’s first DNN shift computing technology, combined with a number of the latest optimization technologies, which can realize the high-performance and low delay deployment of the general deep learning model based on FPGA chip, which is also the world’s first including model cutting, compression Quantization to general model implementation and other optimization algorithms of the complete scheme of AI open source framework on FPGA. It is reported that many companies or research institutes such as Kwai TSE, Shanghai Univer, China Dazhi, remote science and technology, wisdom and wisdom, Hua chin yuan, etc. have joined the TF2 open source community. The community will jointly promote the open source and open cooperation development of AI technology based on customizable chip FPGA, reduce the threshold of high-performance AI computing technology, and help AI users and developers shorten the development cycle.
At present, FPGA technology with customizable, low delay and high performance power consumption ratio has become the choice for many AI users to deploy reasoning applications. However, FPGA development is difficult and long cycle, which is difficult to meet the application requirements of fast iterative deep learning algorithm. TF2 can quickly realize FPGA online reasoning based on mainstream AI training software and deep neural network model DNN, help users give full play to FPGA computing power and realize high-performance and low delay deployment of FPGA. At the same time, TF2 computing architecture can also quickly realize AI chip level design and performance verification.
TF2 calculation acceleration process
TF2 consists of two parts. The first part is the model optimization transformation tool TF2 transform kit, which can compress, cut and quantify the network model data trained by pytorch, tensorflow, Caffe and other frameworks, so as to reduce the amount of model calculation. For example, for the resnet50 model, by compressing the 32-bit floating-point model into a 4-bit integer model and channel clipping, the model file can be trimmed by 93.75%, with almost no precision loss and maintaining the basic computing architecture of the original model.
The second part is the FPGA intelligent running engine TF2 runtime engine, which can automatically convert the optimized converted model file into the FPGA target running file, greatly improve the performance of FPGA reasoning calculation through the innovative DNN shift calculation technology, and effectively reduce its actual running power consumption. TF2 has completed the test verification on mainstream DNN models such as resnet50, facenet, googlenet, squeezenet, etc. The test of TF2 on Inspur F10A FPGA card using facenet model (batchsize = 1) shows that the calculation time of a single picture after running TF2 is 0.612ms and the speed is 12.8 times.
At the same time, Inspur’s open source project also includes the reconfigurable chip design architecture defined by TF2 software. This architecture fully supports the development of current CNN network models and can be quickly transplanted to support the development of transformer, LSTM and other network models. Based on this architecture, ASIC chip development prototype design can be further realized.
Open source FPGA chip level design
According to the open source community construction plan released by Inspur, Inspur will continue to invest in updating TF2, develop new functions such as open source automatic model analysis, structural clipping, arbitrary bit quantization, clipping and quantization based on automl, and support sparse computing, transformer network model, NLP general model, etc. In addition, the community will regularly hold developer meetings and online open classes to share the latest technological progress and experience, train developers through college education plans, and carry out user transplantation scheme formulation and technical support for development.
Liu Jun, general manager of AI & HPC of Inspur Group, said: “AI application deployment covers cloud, edge and mobile terminals, and there are various needs. TF2 can greatly improve the efficiency of cross terminal application deployment and quickly adapt to the model reasoning needs in different scenarios. AI users and developers are welcome to join TF2 open source community to jointly accelerate AI application deployment and promote the implementation of more AI applications.”
Inspur is a leading brand of Artificial Intelligence Computing. AI server has maintained a market share of more than 50% in China, and has maintained in-depth and close cooperation with leading AI technology companies in systems and applications to help AI customers achieve an order of magnitude improvement in application performance in voice, semantics, images, video, search, network, etc. Inspur and its partners build a meta brain ecology, share the three core platform capabilities of AI computing, resources and algorithms, help industry users develop and deploy their own “industry brain”, and accelerate the implementation of industrial AI.