“While there are theoretically limitless ways to implement a modern GPU, what really works is to actually understand the problem and set out to make it a reality. The problems faced in manufacturing modern high-performance semiconductor devices and trying to accelerate current programmable rasterization technologies reveal future trends in the development of the GPU hardware industry.
By Rys Sommefeldt, Senior Director, PowerVR Product Management, Imagination Technologies
While there are theoretically limitless ways to implement a modern GPU, what really works is to actually understand the problem and set out to make it a reality. The problems faced in manufacturing modern high-performance semiconductor devices and trying to accelerate current programmable rasterization technologies reveal future trends in the development of the GPU hardware industry.
For example SIMD processing and fixed-function texture units are so essential in modern GPUs that GPU schemes designed without them almost certainly mean that they are not commercially viable and practical outside of research. Even the wildest visions of any GPU over the past 20 years have not abandoned these core principles (rest in peace, Larrabee (the chip code name for Intel’s GPU)).
Real-time ray tracing acceleration has been defaulted to be the most annoying problem in GPU design for the past 15 years. The mainstream specification for how ray tracing should be implemented on the GPU is Microsoft’s DXR, which requires an execution model that can’t really be integrated into the GPU. , which undoubtedly poses some serious potential problems for any GPU designer who needs to support it. The problem would be even more pronounced if real-time ray tracing was something they hadn’t considered for the past decade, which Imagination has been focusing on.
Key Challenges for Ray Tracing
If you follow the DXR specification and consider what needs to be implemented in the GPU to provide computational acceleration, you will likely quickly tease out the following issues that need to be addressed regardless of the design approach:
First, you need a way to generate and process a set of data structures containing geometry so that you can trace rays based on the geometry in a more efficient way. Second, when tracing a ray, the GPU tests whether the ray intersects it, providing some user-definable programming interface. Third, traced rays can emit new rays! There are other issues that need to be considered in the implementation scheme defined by the DXR specification, but these three factors are the most important from a global perspective.
PowerVR Ray Tracing Hybrid Rendering Effects
Generating and using accelerated data structures to efficiently represent the geometry that needs to be tested for intersection means that the GPU may have to go through a whole new execution phase, and then we need to process these new data structures with a whole new interface function, test for intersection, and then in the program Under the control of the operator, some functions are implemented according to the results of the intersection test. GPUs are designed to be parallel, so what does it mean to process a bunch of rays at the same time? Did doing so uncover new challenges that are very different from those posed by traditional geometry and pixel parallelism?
The answer to the previous question is quite yes, and indeed these differences have profound implications for how ray tracing is mapped into existing GPU-executed models. These GPUs have an imbalance of computing resources and memory resources, which makes memory access a valuable resource, and wasting these resources is one of the main reasons for low efficiency and performance.
Oh no – what did we do?
GPUs are designed to take advantage of any form of access to the DRAM attached to them, exploiting the spatial or temporal locality of memory accesses as a way to do this. Thankfully, the most common and modern forms of rasterized rendering have the nice feature that during shading (especially pixel shading is often the main workload for any given frame) triangles and pixel vertices have the potential to interact with their Neighbors share relevant data. So whatever cached data you need to access a group of pixels, chances are the next adjacent group will need to use some or all of the in-memory data you’ve already pulled from DRAM and cached. This is true for most rasterized rendering workloads today, so we can all breathe a sigh of relief and design GPU architectures around that property.
When we use ray tracing, these all fail. Ray tracing makes all spatial locality disappear. Let us analyze the reasons below.
The easiest way to think about it is to look around you and notice the effect of light on your environment as you sit down to read this article. Since ray tracing models the properties of light as it travels from all light sources, it must handle what happens when light hits any surface in the scene. Maybe we only care which objects the light hits, maybe the surface of the object scatters the light in a uniform direction, but it could also be completely random. Maybe the surface absorbs all the light, so there will be no secondary light propagation. Maybe the surface has a material property that allows it to partially absorb almost all the light that comes in, and then randomly scatter the small amount of light it can’t capture.
Only the first scenario can be mapped to the GPU’s working mode that exploits memory access locality, and even then only if all parallel-processed rays hit the same type of triangle.
It is the possibility of this apparent divergence that causes these problems, if any rays processed in parallel may act differently from each other, including hitting different accelerated data structures or emitting new rays, then the fundamentals for GPUs to work efficiently. The premise is broken, and this is often more damaging than the divergences encountered in traditional geometry or pixel processing.
What PowerVR’s implementation of ray tracing hardware acceleration does is hardware ray tracing and sequencing, which is unique compared to any other hardware ray tracing acceleration in the industry today, which is completely transparent to the software side, ensuring that the hardware There are potential similarities between emitted rays traced in parallel. We call this coherence aggregation.
The hardware maintains a data structure for hierarchically storing the rays emitted by the software that are being processed by the hardware, and is able to select and group them by where they travel in the acceleration structure based on their orientation. This means that they are more likely to share data in the accelerated data structures being accessed in memory as they are processed, with the added advantage of being able to maximize the number of ray-geometry intersection computations that are then processed in parallel.
By analyzing the rays that are dispatched by the hardware, we can ensure that they are grouped in a GPU-friendly way for more efficient subsequent processing, which are key to the success of this system and help avoid breaking the GPU industry for efficient rasterized rendering The carefully designed operating mode, which avoids the need for a special type of memory system for ray tracing hardware, thus provides a solution that is easier to integrate with the rest of the GPU.
The coherence aggregation mechanism itself is quite complex, because it requires fast tracking, sorting and scheduling of all rays submitted to the hardware for processing, so as not to back pressure the scheduling system used by the previous stage to emit rays, and it will not cause the latter stage. Idleness of hardware that takes ordered rays and accelerated data structures as input.
If there is no hardware system to help the GPU handle ray ordering, then it is up to the application or game developer to somehow handle the coherence of the light on the host, or add an intermediate computational link on the GPU to handle the ray ordering – ―As long as this approach is supported by the hardware, none of the hypothetical approaches above will improve efficiency and performance on a real-time hardware platform, however Imagination is the only GPU IP vendor on the market with such a hardware ray tracing system.
keep up with trends
Imagination is the only vendor in the industry to provide a solution for hardware ray tracing because we have been working on this problem for a long time. Compared to some other slowly advancing technologies in the industry, ray tracing has become one of the most widely adopted APIs for graphics technology today.
Our coherence gathering feature is compatible with current industry ray tracing (if a ray happens to emit a new ray, the stack will be released and possibly a new ray, etc.), coherent gathering processing is performed at each stage and Make sure we achieve the power of hardware ray tracing as much as possible.
The most important thing in a modern hardware ray tracing system is to measure the ray beam, peak parallel test rate or empty ray emission and miss rate. These are simple ways to describe the performance of ray tracing hardware, but they are not very useful, after all, the development of People don’t just care about high peak parallel test rates or missed test rates.
Our goal is to use full-scale ray tracing throughout the acceleration system, so developers can budget for what useful functionality to implement with a ray beam. Our coherence aggregation system achieves this goal together with the solutions we provide, which are unique compared to other solutions in the industry.