NVIDIA has added programmable computing capabilities including vertex shader and pixel shader to geforce 256 chip, and named GPU as related graphics chip products. As the name suggests, GPU is graphics processor unit. Unlike the previous generation Riva architecture, it can only carry out graphics work. The introduction of geforce 256 can be said to redefine its graphics chip products, Graphics become a part of its chip function, and programmable computing power is the core to make NVIDIA glow in the future.
However, in the past few years, although we have the ability of programming, it will be 10 years later to play a role in the field of computing. This is because at that time, NVIDIA took the lead in the market through the successful drawing architecture, and the market war with ATI was hot, and the flow processing computing had not been well applied. Therefore, NVIDIA did not expect that the computing power of GPU would have such a huge potential in the future. In 2004, the Stanford University team led by bill dally designed many computing architectures of stream processing technology for the programmable part of GPU. These research results later became the basis of CUDA.
Later, NVIDIA also launched CUDA, a general parallel computing architecture, and Tesla product line for computing applications based on relevant research. Bill dally is one of the important promoters behind this.In 2009, after Bill dally joined NVIDIA, NVIDIA’s pace of GPU computing developed rapidly.
GPU has rapidly become the computing core of major supercomputing and data centers since it was originally used solely for basic scientific research in universities. Almost all of the world’s top 500 supercomputing lists adopt nvida computing solutions. With the maturity of CUDA ecology, the application boom of machine learning and deep learning has been brought from the hot AI topics in 2016.
Tensorcore is a marketing code. The truth of its architecture is to add some corresponding instruction sets. With these instructions, GPU can accumulate half precision matrix product, which is the internal loop basic logic used by many deep learning algorithms. And it doesn’t actually change the basic concept of GPU. Volta is still a genuine GPU. Its performance in rendering graphics is still at the first-class level. Joining tensorcore does not sacrifice any features of GPU itself, but creates a win-win situation. Now Volta can better target deep learning applications and give full play to 100% drawing performance.
In fact, many are about the selection of data types and instructions. GPU architecture is actually a framework in which you can put different data types and different instructions to complete different applications. For example, Kepler architecture does not support reasoning well, and it does not have the ability of half precision floating-point computing.
At first, the support of data type was the key to deep learning, and Kepler would use fp32, resulting in very expensive computing cost. From Pascal, we started to support reasoning computing and fp16 data training computing power, but you won’t say Pascal, so it’s not a GPU. Volta joined tensorcore in a similar situation.Volta is still a GPU and can do graphic calculations. I think GPU architecture is very efficient. And we didn’t sacrifice anything else to do that.
There is a great synergy between deep learning and graphics. Our discovery is that through deep learning, we can make graphics better. Then conduct video research, develop new image anti aliasing and denoising algorithms, and provide image time stability, which are based on deep learning. Therefore, by having deep learning and reasoning ability, the chip will actually perform better in graphics than without tensorcore.
So for things you actually have to work with random gates, FPGA is significantly weaker than ASIC. FPGA can only do well in the problem of using a large number of hardware modules in FPGA. So, if you have hardwired, some FPGAs have 18 beta arithmetic units for DSP operation, and others have 14 point units.
When you have to use the gate on the FPGA, its performance will become unsatisfactory. So we don’t think they are very competitive.Many start-ups are building special deep learning chips, and we are certainly concerned about these developments. But my philosophy is always “we should do what we think we can do best”, and their choices basically limit their development space, so that they can’t do better, because we are trying our best to do our best.
If we subdivide deep learning in the way of three subdivided categories, they are training, reasoning and reasoning of IOT equipment.For training, what we have been doing is focusing on the GPU of deep learning. So if you just build a chip for deep learning, the application may be too narrow to take into account other possible applications. In our architecture, due to the operation of hmma, tensorcore integrated by Volta architecture can achieve huge mathematical calculation. It only needs one instruction to complete 128 floating-point calculation steps, which can take into account more applications.
We do have some additional chip blocks that are not so helpful to deep learning, such as rasterization and texture mapping and composition for graphics rendering, but this part is not large. If we build a special chip, they can really get rid of a small number of non computational blocks on the chip, In theory, chip cost will be more advantageous.
Although we can do this, it just doesn’t make any commercial sense. Our idea is that it’s best to make a chip and be able to do a lot of things. Whether it’s drawing or in the data center, we want to use the chip to do as many things as possible.
Recently, people have made great progress. Now people have more than 50 qubits, and the time to maintain the quantum state can be longer. However, there are still orders of magnitude requirements for a viable business application. The advantage of quantum is that algorithms running on quantum computers can not run with the same performance on traditional computers.
So the process of the algorithm is to simulate a quantum computer. But that doesn’t matter. The concern is to be able to run algorithms such as analog quantum chemistry, or algorithms such as decomposing composite numbers into two tribes to crack the code. Both require more than thousands of qubits. So we are still far from this problem.
Although we believe that quantum computing is not yet practical, we still pay close attention to the development of relevant technologies, so as to avoid changes that we can’t grasp. Most importantly, as part of the drive PX system, we have a complete software platform, including neural network for perception, camera and lidar and radar, and then software vehicles for path planning and control.
We have tested the autopilot of the motorcade with NVIDIA software. We also provide the hardware and software to automobile manufacturers. We also have a software called co pilot in the car, which is now renamed drive IX. its main function is to monitor the driver. It has eye tracking and head tracking. If it sees that the driver is absent-minded or overworked, it can ask the car to give appropriate warning. It has gesture recognition, so you can use gestures to control the car. We also provide complete automatic driving solutions for automobile manufacturers. I think this is the most competitive solution available in the industry.