At present, the functions of AI chips are becoming more and more complex and diversified. On the one hand, chip manufacturers have given different measurement standards, claiming that their products are at the leading level in the industry in terms of computing performance, unit energy consumption and computing power; on the other hand, users are concerned about how to judge whether the chips can actually meet the computing needs of their real scenes from the information given by the manufacturers.
Therefore, mlperf, an industry benchmark organization established in May 2018, launched mlperf benchmark. Mlperf benchmark is the industry’s first set of general benchmarks to measure the performance of machine learning software and hardware, that is, trained neural networks process new data on different levels of devices (Internet of things, smart phones, PCs, servers) and various applications (autopilot, NLP, computer vision) Speed. Mlperf now has more than 50 members, including Google, Microsoft, Facebook, Alibaba and other enterprises, as well as Stanford, Harvard, University of Toronto and other universities, and continues to evolve with the development of AI.
According to the mlperf benchmark test results released recently, NVIDIA’s new DGX superpod with more than 2000 NVIDIA A100 GPU built-in stands out among commercial products on the market, and has achieved excellent results in various mlperf benchmarks for large-scale computing performance. This is the third time in a row that NVIDIA has shown its strongest performance in mlperf training test. In December 2018, NVIDIA set six records in mlperf training benchmark for the first time, and in July of the next year, NVIDIA set eight records again.
The products used in NVIDIA testing are based on the latest NVIDIA ampere architecture and Volta architecture. The A100 tensor core GPU shows the fastest performance in all eight mlperf benchmarks of the accelerator. In terms of realizing the fastest overall large-scale solution, the DGX superpod system, which is a huge cluster of interconnected DGX A100 systems by using HDR Infiniband, has also created eight new milestones in terms of performance.
NVIDIA is the only company to use commercially available products in all tests. Most of the other products submitted for use are either preview category, whose products are expected to be available in a few months, or research category products, which will not be available for a long time.
DGX superpod architecture with speed and scale
NVIDIA runs mlperf test on selenium, which is an internal cluster based on DGX superpod. DGX superpod is a common reference architecture for large-scale GPU cluster. NVIDIA DGX superpod is based on NVIDIA DGX A100 system. NVIDIA DGX A100 integrates 8 A100 GPUs and NVIDIA mellanox HDR Infiniband network technology in a 6U server, which can accelerate high-performance computing, data analysis and AI work (including training and reasoning) and realize rapid deployment.
Selene recently made her debut in the top 500 list, becoming the fastest industrial system in the United States with exaflops level AI performance. It is also the second largest energy-saving system in the world on the green 500 list. In addition to its excellent energy efficiency performance, Selene’s rapid deployment capability is also impressive. Engineers can use NVIDIA’s modular reference architecture to quickly build selenium in less than four weeks. In less than one hour, four operators can assemble a DGX A100 cluster composed of 20 systems and create a system with performance of 2petaflops.
At present, customers have adopted these reference architectures to build their own DGX pod and DGX superpod. Among them is hipergator, the fastest academic AI supercomputer in the United States, which will also become the cornerstone of interdisciplinary AI innovation at the University of Florida.
At the same time, Argonne National Laboratory, the world’s leading Supercomputing Center, is using DGX A100 to find a way to combat the epidemic of covid-19. Argonne National Laboratory is one of the first six high performance computing centers to use A100 GPU.
DGX superpod has helped continental group in automobile field, Lockheed Martin in aerospace field and Microsoft in cloud computing service field to achieve good business results. The smooth operation of these systems is partly due to their extensive ecosystem support for NVIDIA GPU and DGX.
Combination of hardware and software to achieve 4 times performance improvement in a year and a half
Mlperf’s latest benchmark includes two new tests and a significantly revised one. NVIDIA has achieved excellent results in all three tests. Among them, a benchmark test ranks the performance of recommendation system. Recommendation system is a popular AI task. Another benchmark tested conversational AI using bet. Bert is one of the most complex neural network models. Finally, mini go and full-size 19×19 go board are used in the reinforcement learning test. The test is the most complex in this round, covering a number of operations from games to training.
The latest results show that NVIDIA focuses on the continuous development of AI platforms across processors, networks, software and systems. For example, the test results show that compared with the V100 GPU based system used in the first round of mlperf training test, today’s DGX A100 system can achieve up to 4x performance improvement with the same throughput. At the same time, thanks to the latest software optimization, the dgx-1 system based on NVIDIA V100 can also achieve up to 2 times performance improvement.
In less than two years, the innovation of the entire AI platform has achieved such excellent results. Today, the software update of NVIDIA A100 GPU and cuda-x library has injected strong power into the expansion cluster built through mellanox HDR 200GB / s Infiniband network. HDR Infiniband can achieve extremely low latency and high data throughput, and provide intelligent deep learning computing acceleration engine through scalable hierarchical aggregation and reduction protocol (sharp) technology.
NVIDIA ampere market adoption speed sets new records
A100 is the first processor based on NVIDIA ampere architecture. Thanks to its many innovations, NVIDIA A100 integrates AI training and reasoning, and its performance is up to 20 times higher than that of the previous generation.
NVIDIA ampere GPU adopts a 7-nanometer process and contains more than 54 billion transistors, which is quite impressive. The third generation tensor core with tf32 can improve the AI performance of fp32 by 20 times without changing any code. In addition, tensor core now supports fp64 precision, which provides up to 2.5 times more computing power for HPC applications than before.
At the same time, the new ampere architecture is equipped with multi instance GPU (MIG), the third generation NVIDIA nvlin, structured sparsity and other technologies. MIG technology can divide a single A100 GPU into up to seven independent GPUs, which can provide different computing power for different scales of work, so as to maximize the best utilization and return on investment. The third generation of NVIDIA nvlink makes the high-speed connection between GPUs twice as much as before, and realizes the efficient performance expansion of servers. The third generation NVIDIA nvlink interconnection technology can combine multiple A100 GPUs into a huge GPU to perform larger training tasks.
A100 not only broke the performance record, but also entered the market faster than any NVIDIA GPU in the past. A100 was used in the third generation DGX system of NVIDIA at the beginning of its release. Only six weeks after the official launch, A100 officially landed on Google cloud.
In order to meet the strong demand of the market, global leading cloud providers such as AWS, baidu cloud, Microsoft azure and Tencent cloud, as well as dozens of major server manufacturers such as Dell technologies, HPE, Inspur and ultramicro all adopt A100. Users around the world are using A100 to deal with the most complex challenges in AI, data science and scientific computing, including a new generation of recommender systems or conversational AI applications, or to further explore the treatment of cowid-19.
Alibaba created a sales record of 38 billion US dollars during the “double 11” period in November. Its recommendation system uses NVIDIA GPU, which makes the number of queries per second more than 100 times of CPU. The dialogue AI itself has become the focus of the industry, promoting the business development from finance to health care and other industries.
In May this year, NVIDIA released two application frameworks Jarvis for conversational AI and Merlin for recommendation system. Merlin contains the hugectr training framework to help with the latest mlperf benchmark results. These application frameworks are only part of it. Among the growing application frameworks are NVIDIA drive for the automotive market, Clara for the healthcare market, Isaac for the robotics market, and metropolis for the retail / smart city market.
NVIDIA ecosystem enables AI industry
In fact, NVIDIA’s GPU has become the cornerstone of artificial intelligence. On the one hand, it is its continuous innovation in GPU, but also because of its ecosystem. Among the nine companies that submitted the results, in addition to NVIDIA, six other companies submitted the test results based on NVIDIA GPU, including three cloud service providers (Alibaba cloud, Google cloud and Tencent cloud) and three server manufacturers (Dell, Fujitsu and Inspur), highlighting the advantages of NVIDIA ecosystem.
Most of these partners use the containers in NVIDIA software center NGC and the open framework for competition. Nearly 20 cloud service providers and OEMs, including these mlperf partners, have adopted or plan to adopt NVIDIA A100 GPU to build online instances, servers and PCI cards.
Most of the software used by NVIDIA and its partners in the latest mlperf benchmark is now available through NGC. NGC includes several GPU optimized containers, software scripts, pre training models and SDKs, which can help data scientists and developers speed up AI workflow on common frameworks such as tensorflow and pytorch.
At present, artificial intelligence will become the core driving force of a new round of industrial change. From the mlperf benchmark test results, we can see the powerful AI performance of NVIDIA’s latest generation A100 GPU, which provides a reference for users to select AI, better supports the innovative practice of AI applications, and promotes the development of the entire AI industry chain.
Editor in charge: PJ