Author: Rys sommefeldt, senior product management director of imaging technologies powervr
Although there are unlimited ways to realize modern GPU in theory, the real effective way is to understand the problem and start to turn the solution into reality. The problems of manufacturing modern high-performance semiconductor devices and trying to accelerate the current programmable grating technology reveal the future trend of GPU hardware industry.
For example, in modern GPU, SIMD processing and fixed function texture units are indispensable, so that GPU design without them almost certainly means that it is not commercially feasible and practical outside the research. Even the craziest vision of any GPU in the past 20 years has not abandoned these core principles (rest in peace, Larrabee).
In the past 15 years, real-time ray tracing acceleration has been regarded as the most troublesome problem in GPU design by default. The mainstream specification about how ray tracing should be implemented on GPU is DXR introduced by Microsoft, but its required execution model can not really integrate into the working mode of GPU, which undoubtedly brings some serious potential problems to any GPU designers who need to support it. If real-time ray tracing is something they haven’t thought about in the past decade, then the problem will be more obvious, and imagination has been focusing on it.
Key challenges for ray tracing
If you follow the DXR specification and consider what needs to be implemented in GPU to provide computing acceleration performance, you are likely to quickly sort out the following problems that need to be solved no matter what design scheme is adopted:
First of all, you need a way to generate and process a set of data structures that contain geometry, so that you can track rays from the geometry in a more efficient way. Secondly, when tracing ray, GPU should test whether ray intersects with it, and provide some user-defined programming interfaces. Third, the tracked light can emit new light! Other issues need to be considered in the implementation of DXR specification definition, but these three factors are the most important from a global perspective.
Mixed rendering effect of powervr ray tracing
Generating and using accelerated data structures to effectively represent the geometry that needs to do intersection test means that GPU may have to complete a new execution phase. Then we need to use new interface functions to process these new data structures, test whether they intersect, and then implement some functions according to the results of intersection test under the control of programmers. GPUs are designed in parallel, so what does it mean to process a bunch of lights at the same time? Do you find new challenges that are quite different from those brought about by traditional geometric and pixel parallel processing?
The answer to the previous question is very positive. Indeed, these differences have a profound impact on how to map ray tracing to the existing GPU execution model. The imbalance between computing resources and memory resources exists in these GPUs, which makes memory access a valuable resource, and the waste of these resources is one of the main reasons for low efficiency and performance.
Oh no – what did we do?
GPU is designed to make full use of the access of DRAM connected to it in any form, and the spatial or temporal locality of memory access is used as the method to achieve this purpose. Fortunately, the most common and modern rasterized rendering has a very good feature, that is, triangles and pixel vertices may share relevant data with their neighbors during shading (especially pixel shading is usually the main workload of any given frame). Therefore, any cache data that you need to access a set of pixels is likely that the next adjacent group will need to use some or all of the memory data that you have extracted and cached from DRAM. This is true for most of today’s rasterized rendering workloads, so we can all breathe a sigh of relief and design the GPU architecture around this attribute.
When we use ray tracing, all of these fail. Ray tracing makes all spatial locality disappear. Now let’s analyze the reasons.
Problems on the surface of objects
The easiest way to think is to look around and pay attention to the role of light in your environment as you sit down to read this article. Since ray tracing models the properties of light as it propagates from all sources, it must deal with what happens when light strikes any surface in the scene. Maybe we only care about which objects the light shines on, maybe the surface of the object scatters the light in a uniform direction, but it can also be completely random. Maybe the surface absorbs all the light, so there is no secondary light. Perhaps the surface has a material property that allows it to partially absorb almost all the incoming light and then randomly scatter a small amount of light that it cannot capture.
Only the first scene can be mapped to the GPU’s working mode of using memory access locality. Even so, it can only be mapped when all the parallel processing rays shine on the same type of triangle.
It is this possibility of obvious divergence that leads to these problems. If any ray of parallel processing may have different effects on each other, including hitting different accelerating data structures or emitting new rays, then the basic premise that GPU can work efficiently will be broken, And this is usually more destructive than the divergence phenomenon encountered in traditional geometry or pixel processing.
Powervr’s implementation of ray tracing hardware acceleration is hardware ray tracing and sorting, which is unique compared with any other hardware ray tracing acceleration in today’s industry. It is completely transparent to the software, and ensures that the emitted rays of parallel tracing on hardware have potential similarity. We call it coherent aggregation.
The hardware maintains a data structure for hierarchical storage of the light emitted by the software that is being processed by the hardware, and can select and group them according to their direction and their forward position in the acceleration structure. This means that when they are processed, they are more likely to share the data in the accelerated data structure accessed in memory, and the additional advantage is to maximize the number of ray geometry intersection calculations to be processed in parallel later.
By analyzing the rays scheduled by the hardware, we can ensure that they are grouped in a GPU friendly way, so as to carry out the subsequent processing more efficiently. These are the key to the success of the system, and help to avoid breaking the operation mode carefully designed by the GPU industry for efficient raster rendering, which avoids the demand of ray tracing hardware for special types of storage systems, Therefore, it provides a solution that is easier to integrate with other parts of GPU.
Coherency aggregation mechanism itself is quite complex, because it needs to track, sort and schedule all the rays submitted to the hardware for processing quickly, so it will not back pressure the scheduling system used to transmit rays in the former stage, and it will not cause the idleness of the hardware with sorted rays and accelerated data structure as input in the latter stage.
If there is no hardware system to help GPU deal with ray sorting, it is necessary to rely on application or game developers to deal with the coherence of ray in some way on the host, or to add an intermediate computing link to GPU to deal with ray sorting, provided that this way is supported by hardware, None of the above assumptions can improve the efficiency and performance on the real-time hardware platform. However, imagination is the only GPU IP provider with this kind of hardware ray tracing system in the market.
Keep up with the trend
Imagination is the only solution provider for hardware ray tracing in the industry because we have been working on this problem for a long time. Compared with other technologies in the industry which are developing slowly, ray tracing has become one of the widely used APIs in today’s graphics technology.
Our coherent aggregation feature is compatible with ray tracing in the current industry (if the ray just emits a new ray, the stack will be released, or it may emit a new ray, etc.). We perform coherent aggregation processing at each stage and ensure that we can achieve the powerful performance of hardware ray tracing as much as possible.
In modern hardware ray tracing system, the most important thing is to measure ray beam, peak parallel test rate or null ray emission and miss rate. These are simple ways to describe the performance of ray tracing hardware, but they are not very useful. After all, developers are not only concerned about high peak parallel test rate or miss rate.
Our goal is to use comprehensive ray tracing in the whole acceleration system, so that developers can budget what useful functions they want to achieve with ray bundles. Our coherent aggregation system and our solutions together achieve this goal, which is unique compared with other solutions in the industry.