Editor’s note: The information technology represented by cloud computing has achieved remarkable results in the prevention and control of the current epidemic, whether it is remote video, virus analysis, epidemic risk control analysis, video surveillance, front-end such as body temperature and face collection and analysis, etc. It is inseparable from the support of the back-end data center. This will inevitably involve massive data processing capabilities, which rely on the computing capabilities provided by server clusters.

No technology can solve all problems once and for all. It was the case with the mainframe back then, the PC Server today, and even more so with the cloud computing of the future. Therefore, we can only use and modify, summarize and explore.

Although we have opened the door to the digital age, we have seen the tremendous changes that digital technology has brought to business and society. But at the digital infrastructure level, the problem is far from over. It can even be said that even the most advanced hyperscale data centers have their own immediate and far-sighted concerns.

The immediate concern is the construction and operation and maintenance costs of the data center; the far-reaching concern is the architectural challenge brought about by the emergence of new computing types.

Cost – the mountain that the data center is always climbing

Even if the data center is located in an area with relatively cheap land and electricity prices, the data center is still the most obvious asset-heavy project in IT construction, and its construction and operation require huge amounts of capital. Due to the need for a lot of civil engineering and infrastructure support, each square meter of the data center that can be reserved for IT equipment requires a five- or six-digit price.

In the face of data center construction costs comparable to the housing prices of Beijing, Shanghai and Guangzhou, the density of calculations has become a core equipment indicator that any data center builders will care about. With the ultimate pursuit of computing density, it is not surprising that custom server projects such as Olympus, Open19, ODCC Scorpio Project, and OpenRack continue to appear.

Of course, in addition to meticulously crafting the device structure, for most data centers, there is another more direct way to improve computing density and efficiency – a computing platform with more cores and higher performance.

AMD second-generation EPYC platform: peak performance and more cost-effective

Following the launch of the first-generation EPYC processors code-named “Naples” in 2017, AMD released its second-generation EPYC processors code-named “Roma” in 2019.

AMD’s second-generation EPYC series processors codenamed “Roma”

As the industry’s first processor based on 7nm process, the second-generation EPYC processor has built-in 64 cores and 128 threads, 256M L3 cache, eight-channel DDR4-3200 memory (single-channel 4TB), 128 PCIe 4.0 and a series of new features . According to data released by AMD, its double the number of cores and the optimization for increasing the number of cycle instructions, its floating-point performance is about 4 times higher than that of the previous generation. At the same time, the introduction of the 7nm process technology brings higher energy efficiency; this also means that the energy efficiency ratio of the second-generation EPYC is twice that of the previous generation. The second-generation EPYC processors lead in multiple industry performance benchmarks, holding more than 140 world records to date.

Thanks to the chiplets design, the second-generation EPYC platform has completed the iterative upgrade of the product in a relatively short period of time. Compared with the single-chip SoC method, the chiplets design is more flexible, because in many scenarios, if the functions are integrated on a die, its performance, power consumption, and area are usually difficult to achieve optimal. The use of Chiplets can also shorten the product development cycle and reduce design risks to a certain extent. AMD calls it the “AMD Infinity” hybrid multi-chip architecture, which has reached new heights in second-generation EPYC processors.

On the second-generation EPYC platform, AMD used the latest 7nm process to design the Core Chiplet Die (CCD for short) part, and each CCD part contains two CPU Complexes (CCX for short), each CCX integrates four cores . In the parts of I/O processing such as memory, PCI-e and disk controllers, the 14nm process is used to reduce costs and development cycles. Each I/O core can be connected to up to 8 CCD cores, ie a design of up to 64 cores.

At the SSCC2020 conference held in February, AMD talked about using 7nm+14nm process to effectively reduce costs compared with pure 7nm process design: if 64-core products are used as benchmark comparisons, in 48-core, 32-core, 24-core, 16-core products There are up to about twice the savings, where the larger the number of cores, the greater the cost savings.

The core part packaged by TSMC’s 7nm process not only greatly improves the energy efficiency ratio, but also effectively reduces the area of ​​the CCD compared to the first-generation EPYC, which allows the CCD part of the second-generation EPYC platform to be plugged into more cores. At the same time, improved branch predictor, optimized L1 instruction cache, twice the data width of floating point unit, instruction store queue, twice the read and write bandwidth of L1 data cache. Various changes such as double the L3 cache of a single CCX make Zen 2 have more than 15% IPC improvement over Zen 1. In the end, Zen 2 has about twice the energy efficiency improvement compared to Zen 1. It is worth mentioning that, because the use of this non-integrated chip design effectively shortens the product design cycle, AMD will also launch the third-generation EPYC code-named “Milan”, and the use of TSMC’s 7nm+ process is expected to further improve the energy efficiency ratio.

At present, the AMD EPYC platform has been favored by large cloud service providers including Tencent Cloud, Amazon, Microsoft, Oracle, and Google; in addition, top OEM partners including Dell, HPE, H3C, Lenovo, etc. have also cooperated with AMD to launch rich of server products based on the second-generation EPYC platform.

From another perspective, AMD did not replace the processor socket during the upgrade process of the EPYC platform from the first generation to the second generation. This means that users can get more cores and more performance without replacing the entire server by upgrading firmware and installing new CPUs. Obviously, this is very attractive to some users.

Looking back at the original question of the article, building a data center usually requires a relatively complex process, and the actual situation is not so much possible to reinvent the wheel. On the premise of not changing the original wind, fire, hydropower and other supporting facilities, how to continuously improve computing performance while ensuring an excellent energy efficiency ratio is the long-term pursuit of enterprise data centers. Looking at the design ideas of the AMD EPYC platform, it can protect customers’ original investment to a large extent and meet the product life cycle planning of batch upgrades for most users.

Chiplets help Moore’s Law open up more possibilities

Faced with more and more types of computing, the server also has a trend of computing scenarios. By installing different types of computing cores such as GPU, FPGA, and AISC, servers can often obtain better computing efficiency in specific application scenarios. But at the same time, the diversification of computing power by increasing the number of computing cards will also put forward higher requirements on the power consumption, heat dissipation, and physical space occupation of the server.

Faced with such a contradiction, Chiplets, which was first practiced by AMD in the data center computing platform, gave an exciting solution. In the future, by integrating different chips on the same substrate, AMD and its partners can solve the big challenge of computing power diversification within the square inch of Socket. Since the CPU often has the best power supply and cooling resources in the server, the various problems caused by the diversification of computing power in the form of computing cards will also be solved.

In other words, on the existing substrate, AMD can easily increase or decrease the number of chips. This allows AMD to launch products for mid-range or entry-level pure processors at lower costs and prices. Of course, after removing some CCDs, the spare IO bandwidth and bus can also be used to interface with other types of die also integrated on the substrate, thereby creating application-specific “heterogeneous” processors.

Following this line of thinking, we can look forward to a more colorful future. For example, we can replace one or more of the CCDs on the CPU with GPUs and increase the corresponding HBM, so as to obtain better ML/DL and inference application performance (and this method is also likely to solve the problem between multiple GPUs. A new method of data exchange and synchronization); for another example, we can also replace these CCDs with more targeted ASICs to enhance the performance of the CPU in other specific algorithms, thereby creating a more scenario-based computing platform solution.

Compared with the previous overall chip design ideas, this method of using Chiplets technology to create heterogeneous chips can not only allow existing IP to continue to exert value, but also greatly reduce the development cycle and cost of new processors or computing platforms, allowing more computing power. Scenarios can achieve substantial performance optimization and infrastructure simplification at lower costs.

On a more macro level, Chiplets technology is also likely to be an effective catalyst for continuing Moore’s Law.

Between the decisive battles

At the heart of servers, and indeed of the entire digital infrastructure, processors are a fairly complex art. It is art not only because it needs to constantly explore new balances between design, process and engineering, but also because a processor has to deal with the dual challenges of the present and the future.

The computing platform represented by an excellent processor should not only provide users with visible performance improvement, but also provide users with new development directions and new development ideas for the future of the industry.

From these two perspectives, the second generation of EPYC processors has a unique meaning. The combination of 64 cores and 7nm brings users a visible increase in computing performance and density, which can help data center stations reach new heights in cost and performance. The application of Chiplets technology provides a new path for Moore’s Law to continue to take effect under the background of slowing process progress.

To be able to interpret technology and future insights between a few hundred square millimeters, such a product has indeed been rare for a few years.

I’m not trying to overstate the rhetoric here; not because the product isn’t worth it, but because soon we’ll have a brand new Milan to look forward to. By then it will not be too late to read.


Leave a Reply

Your email address will not be published.