Today’s AI and deep learning applications use large datasets and fast I / O technology, but data storage can cause performance problems. People need to know what functions artificial intelligence and deep learning storage systems should have.
Artificial intelligence technology, widely used in machine learning and deep learning, has led to an explosive growth in research and product development, as companies have found creative ways to apply these new algorithms to innovative methods of process automation and predictive insight. The essence of machine learning and deep learning models requires the acquisition, preparation, movement and processing of large data sets, which usually simulate the neural structure and connectivity of the brain.
In particular, deep learning model requires a large number of data sets, which is a unique challenge for the storage of artificial intelligence and deep learning. A brief introduction to the nature of machine learning and deep learning software will reveal why storage systems are so important to these algorithms to provide timely and accurate results.
Why is the storage of AI and deep learning important
Many researchers have proved that with the increase of data sets, the accuracy of deep learning model will also improve. Researchers often use complex data enhancement techniques to generate additional data for model training.
For example, the Imagenet dataset used to benchmark various deep learning image classification algorithms contains more than 14 million images with a large number of annotations. Although the resnet-50 model, which is often used to benchmark image classification hardware, is just over 100MB in size. These models are best kept in memory, and data must be provided continuously, which usually leads to the storage system becoming the bottleneck of overall performance.
Deep learning storage system design must provide balanced performance between various data types and deep learning model.
Regardless of the model and application, deep learning includes two steps: model training and data reasoning. Training is a process of calculating and optimizing model parameters on the basis of repeated, usually recursive calculations using training data sets. Data reasoning is where trained models are used to classify and predict new input data.
Each step emphasizes in a different way the system that provides storage for AI and deep learning. Through training, the pressure comes from large datasets and fast I / O to computing complexes (usually distributed clusters), which are necessary for acceptable performance. Through reasoning, the pressure comes from the real-time nature of the data, which must be processed with minimum delay.
Storage performance requirements of deep learning
The nature of deep learning algorithms means that they use a lot of matrix mathematics. Unlike computer graphics, neural networks and other deep learning models do not require high-precision floating-point results, and are usually further accelerated by a new generation of artificial intelligence optimized GPUs and CPUs that support low-precision 8-bit and 16 bit matrix computing. This optimization can turn storage systems into larger performance bottlenecks.
The diversity of deep learning models and data sources, as well as the design of distributed computing commonly used in deep learning servers, mean that systems designed to provide storage for AI must address the following issues:
A variety of data formats, including binary object (BLOB) data, images, video, audio, text, and structured data, have different formats and I / O characteristics.
· scale out system architecture, where workloads are distributed across multiple systems, typically 4 to 16, for training, and possibly hundreds or thousands for data reasoning.
Bandwidth and throughput, which can quickly provide large amounts of data to computing hardware.
· IOPs can maintain high throughput regardless of data characteristics; that is, many small transactions and less large transfers can be realized.
– provide data with minimal latency because, like virtual memory paging, the performance of the training algorithm decreases significantly when the GPU is waiting for new data.
Deep learning storage system design must provide balanced performance among various data types and deep learning models. According to an engineer at NVIDIA, it’s critical to verify storage system performance under various load conditions.
“The complexity of the workload combined with the amount of data required for deep learning training creates a challenging performance environment,” he said. Given the complexity of these environments, it is critical to collect baseline performance data before production to verify that the core systems (hardware components and operating systems) can deliver the expected performance under combined loads. “
Core functions of deep learning storage system
The above performance factors have prompted AI storage system suppliers to adopt five core functions, including:
1. Design of incrementally scalable parallel expansion system in which I / O performance expands with capacity. One of the hallmarks of this design is the distributed storage architecture or file system, which distinguishes logical elements such as objects and files from the physical devices or devices that hold them.
2. Programmable and software defined control platform, which is the key to realize horizontal expansion design and automation of most management tasks.
3. Enterprise level reliability, durability, redundancy and storage services.
4. For the deep learning training system, the tightly coupled computing storage system architecture is adopted, and the non blocking network design is adopted to connect the server and storage. The minimum link speed is 10GB to 25gb Ethernet or EDR (25gbps) Infiniband.
5. SSD devices increasingly use faster nvme devices, providing higher throughput and IOPs than SATA.
O Das systems usually use nvme over PCIe devices.
O NAS designs typically use 10GB Ethernet or faster Ethernet, using nvme over fabric, Infiniband, or switched PCIe fabrics.
NVIDIA’s dgx-2 system is an example of a deep learning high-performance system architecture.
Customized storage products
Artificial intelligence is now a hot technology, and suppliers respond to the market quickly by mixing new and updated products to meet the demand of artificial intelligence workload. In view of market vitality, there will be no attempt to provide a comprehensive catalog of products for AI storage optimization or targeting, but here are some examples:
Dell EMC provides out of the box solutions for artificial intelligence, including rack, server, storage, edge switch and management node. The storage uses Isilon H600 or F800 all flash memory expansion NAS with 40gbe network link.
DDN a3i uses ai200 or ai400 nvme full flash array (AFA), with 360tb capacity and 750k and 1.5miops, and four or eight 100gbe or EDR Infiniband interfaces, or DDN ai7990 hybrid storage device (5.4 Pb) capacity, 750k IOPs and four 100 GBE or EDR Infiniband interfaces, respectively. DDN also bundles the product with NVIDIA dgx-1 GPU acceleration server and Hewlett Packard enterprise Apollo 6500 acceleration server.
The IBM elastic storage server AFA array is available in a variety of SSD based configurations, providing up to 1.1 Pb of usable capacity. IBM also has a reference system architecture that combines elastic storage server with power systems server and powerai enterprise software stack.
The NetApp OnTap AI reference architecture combines NVIDIA dgx-1 servers with the NetApp AFA A800 system and two Cisco nexus 3K 100gbe switches. A800 can provide 1m IOPs with half millisecond delay time. Its scale out design can provide more than 11m IOPs in 24 node cluster.
Pure storage Airi is another dgx-1 integrated system that uses pure’s flashblade AFA system to support file and object storage. Arista, Cisco or mellanox switches provide reference systems. For example, arista designed 15 17tb flashblades and 8 40gbe links to the arista 32 port 100 GBE switch.
Airi system architecture of pure storage
Deep learning inference system has low requirements for storage subsystem, which can be realized by using local SSD hard disk in x86 server. Although reasoning platforms are usually traditional 1U and 2U server designs with local SSD hard disks or nvme slots, they increasingly include computing accelerators such as NVIDIA T4 GPUs or FPGA that can compile some deep learning operations into hardware.
Editor in charge: CT