In the past few years, a large number of processors have entered the market with the sole purpose of accelerating artificial intelligence and machine learning workloads. Since different types of machine learning algorithms may be used, these processors usually focus on several key areas, but one limitation limits them is how large you can make processors.

Two years ago, cerebras unveiled a revolution in chip design: its processor is as big as your head – the area used on a 12 inch wafer is as large as the area allowed by a rectangular design. It is reported that this chip based on the 16nm process can focus on AI and HPC workloads at the same time.

Cerebras unveils a new revolution in chip design

Today, the company is releasing its second generation product based on TSMC 7Nm. The number of cores has more than doubled, while the number of all products has more than doubled.

Second generation WSE (wafer scale engine)

The new processor from cerebras is based on TSMC’s N7 process. This enables logic to be scaled down and reduces SRAM to a certain extent. Now the new chip has 850000 AI cores.

We can see from the figure below that basically, all the contents about the new chip are more than 2 times:

Like the original processor (called wafer scale engine (wse-1)), the new wse-2 integrates thousands of AI cores on an area of 46225 mm2. In this space, cerebras has integrated 2.6 trillion transistors and built 850000 AI cores. In contrast, the second largest AI CPU on the market is about 826 mm2, with 0.054 trillion transistors. Cerebras also refers to 1000 times of on-board memory with 40 Gb of SRAM, while the ampere A100 is 40 MB.

The core is connected to the 2D mesh with FMAC data path. Cerebras achieves 100% yield by designing a system that can bypass any manufacturing defects.

Initially, cerebras had 1.5% additional cores to accommodate defects, but because TSMC’s process was so mature, we were told that it was too much. The goal of cerebras and WSE is to provide a single platform designed through innovative patents, which allows larger processors for AI computing, but has also been extended to a wider range of HPC workloads.

Cerebras unveils a new revolution in chip design

Built on the first generation WSE

The key to the design is a custom graphics compiler, which uses pytorch or tensorflow and maps each layer to the physical part of the chip, thus allowing asynchronous computation when data flows through. Having such a large processor means that the data will never have to fall behind, nor need to wait in memory, without wasting power, and can be continuously moved to the next stage of computing in a pipelined manner. The design of compiler and processor also takes into account the sparsity, which can achieve high utilization regardless of the batch size, or make the parameter search algorithm run at the same time.

Cerebras unveils a new revolution in chip design

Cerebras’ first generation WSE is packaged and sold as part of the complete system of CS-1. The company has dozens of customers who have deployed and run the deployed system, including many research laboratories, pharmaceutical companies, biotechnology research, military and oil and gas industries. Natural gas industry. Lawrencelivermore paired a CS-1 with its 23 pflop “Larson” supercomputer. The Pittsburgh Supercomputer Center purchased two systems for $5million and connected them to their neoportex supercomputer to achieve synchronous AI and enhanced computing power.

Products and partners

Cerebras now sells a complete CS-1 system in the form of a 15u box, which includes one wse-1 and 12×100 GBE, twelve 4 kW power supplies (six redundant, peak power of about 23 kW), and is paired with HPE’s Superdome flex in some institutions. The new CS-2 system shares the same configuration. Although the number of cores has more than doubled and the on-board memory has also doubled, the power consumption is still the same. Compared with other platforms, these processors are arranged vertically in the 15u design for easy access on such a large processor and built-in liquid cooling. It should also be noted that these front doors are machined from a single piece of aluminum.

The uniqueness of cerebras design can go beyond the physical manufacturing limitations that usually occur in the manufacturing process, namely the marking limits. The design of the processor is limited to the maximum size of the chip, because it is difficult to connect the two areas by cross hairs. This is part of the secret cerebras brought to the table. The company is still the only company that provides processors of this size -cerebras has developed and obtained the same patent for manufacturing these large chips, which still works here. The second generation WSE will be built into the CS-2 system, which is similar to CS-1 in terms of connectivity and vision.

The same compiler and updated packages make the second system available to any customer who has tried AI workloads on the first system when deploying them. Cerebras has been implementing at a higher level to enable customers with standardized tensorflow and pytorch models to assimilate their existing GPU code very quickly by adding three lines of code and using cerebras’ graphical compiler. Then, the compiler divides the entire 850000 cores into segments at each layer, allowing data flow to be pipelined without pausing. The chip can also be used in multiple networks at the same time for parameter search.

Cerebras unveils a new revolution in chip design

Cerebras pointed out that having such a large single-chip solution means that the obstacles of distributed training methods across more than 100 AI chips have been removed so far that in most cases this excessive complexity is not required – for this reason, we see a single system of CS-1 deployed to supercomputers.

However, cerebras pointed out that two CS-2 systems will provide 1.7 million AI cores in a standard 42U rack, or three systems will provide 2.55 million AI cores in a larger 46u rack (assuming sufficient power at one time!) To replace a dozen racks of alternate computing hardware.

In hot chips 2020, Sean lie, the company’s chief hardware architect, said that one of the main benefits of cerebras to customers is that it can simplify the workload. Previously, GPU / TPU racks were used, but it can run on a single WSE in a computing related manner.

Cerebras unveils a new revolution in chip design

As a company, cerebras has about 300 employees in Toronto, San Diego, Tokyo and San Francisco. Andrew Feldman, the CEO of the company, said that as a company, they have realized profits, many customers have deployed CS-1, and more customers have tried CS-2 remotely when starting the commercial system.

In addition to AI, because the flexibility of the chip makes fluid dynamics and other computational simulation possible, cerebras has attracted many customers in typical commercial high-performance computing markets (such as oil and gas and genomics). The deployment of CS-2 will be carried out in the third quarter later this year, and the price has risen from $2-3 million to “several million” dollars.

Editor in charge: PJ

Leave a Reply

Your email address will not be published. Required fields are marked *