Moving Gears to Tier 2 Variable Rate Shading
The team at The Coalition oasis’t stopped innovating since bringing Tier 1 VRS to Gears Tactics, and have brought Tier two VRS support to both
The team saw similarly big perf gains from VRS Tier 2 – up to 14%! – this time with no noticeable visual impact. See for yourself if you lot tin tell which side of the outset prototype in the web log has VRS enabled to get a perf boost, and which side doesn’t.
Because of the unprecedented alignment between PC and Xbox with DirectX 12 Ultimate, The Coalition could bring their implementation to both console and PC with ease. That’southward right: their VRS Tier two implementation runs on the full range of DirectX 12 Ultimate-capable devices, from the Xbox Series Ten|Due south to supported AMD and NVIDIA cards on PC!
Hither’s another first-class guest weblog from The Coalition, where Chris Wallis shares implementation details, operation information, awesome screenshots and a useful section for developers evaluating whether or non to bring VRS to their engines.
Moving Gears to Tier two Variable Rate Shading
Chris Wallis, Senior Software Engineer at The Coalition
Tier ii VRS allowed Gears five/Tactics to see an upwardly to fourteen% boost in GPU perf with no perceptible impact to visual quality. It is available on all hardware supporting DirectX 12 Ultimate, including Xbox Series X|S, AMD Radeon™ RX 6000 Series graphics cards, and NVIDIA GeForce RTX 20 Series and 30 Series GPUs.
VRS is enabled on the left and disabled on the right.
Moving to Tier 2
The Xbox Series Ten|S launch of Gears five/Tactics and the story DLC
added new rendering features including contact shadows and screen space global illumination besides as an emphasis on 60FPS cinematics. While the features had neat visual results, they were plush even on high-end GPUs. We investigated means of keeping 4K and 60 FPS while also maintaining the rich detail of our PC Ultra textures and running these new visual features on both Xbox Serial X|S and PC. This led usa to revisit some of the VRS work done in Gears Tactics. While the use of Tier 1 VRS in Gears Tactics offered some peachy performance gains, it had some small compromises to visual quality and didn’t work well with Dynamic Resolution Scaling. As a outcome, we investigated the actress flexibility allowed in Tier two to see if we could solve the Tier ane shortcomings.
The primary deviation between Tier one and Tier two VRS is granularity. Tier 1 allows you to specify a shading rate per draw. Tier 2 allows yous to instead specify the shading rate in a screen space texture. The texture is not one-1 with the render target just instead specified in coarser VRS tiles of either 8×8 or 16×16 depending on hardware. Past analyzing our previous frame’s scene color, this allowed us to output a texture that would use coarse shading rate only in sections that nosotros’ve determined tin can reduce shading without causing any perceptible difference. For more details on the VRS API, refer to the VRS announcement.
Tier 1 VRS Visualization on Gears Tactics. Colored regions mark the employ of coarser shading rates
Tier 2 VRS Visualization on Gears Tactics. Colored regions marker the apply of coarser shading rates
VRS Texture Generation
We generated the VRS texture by running a sobel border detection compute shader on our final scene color buffer. The VRS Texture is reprojected for use on the adjacent frame as part of a rescale shader described after. The edge detection is run on the luminance of the sRGB color. The use of sRGB ensures that edges are detected based on the
difference of colors. A configurable threshold value is passed to the shader that tin can adapt how ambitious the border detection should exist and is also the primary knob used for tuning the different VRS quality settings on PC.
Screenshot from the Gears five Hivebusters DLC on Xbox Series 10:
VRS Texture Visualization:
The relatively uncomplicated edge detection filter tin can find areas of lower frequency item from a broad variety of passes. These are some common cases the border detection chooses to reduce the shading rate:
- Reduced visibility due to shadowing/low lighting
- Apoplexy due to volumetric fog
- Dense translucent particles (i.east. the waterfall)
By having the edge detection at the end of the frame, information technology besides caught areas blurred out due to post processing furnishings such as move blur and depth of field. This is especially effective in cinematics where a majority of the screen is ofttimes blurred due to depth of field.
We learned from Gears Tactics that coarser shading is more noticeable in some passes than others. To handle this, we generated a second
VRS texture that passes can opt in to. The conservative VRS texture added no notable overhead because we generated both textures in the same border detection shader but do a check against a more conservative threshold value when computing the shading rate for the conservative VRS tile. As an instance, nosotros found our translucency pass by and large contained low frequency detail textures like water or dust particles that would more often than not take well to more aggressive amounts of VRS. Techniques similar our screen space reflections (SSR) that relied on dithering and temporal aggregating benefitted from a more conservative use of VRS.
Nosotros made it a priority to minimize the overhead of the border detection shader. An unoptimized shader can potentially cause VRS exist slower than not using it at all due to the shader overhead. We made several primal optimizations that were able to get our VRS texture generation to under .1ms on Xbox Series X|S and DX12 Ultimate GPUs:
Skip edge detection on the borders of a VRS tile
The Variable Rate Shading spec specifies that fibroid shading will never straddle the edge of a VRS tile. As a outcome, nosotros skipped running border detection on the exterior boundaries of a VRS tile. To use an 8×viii tile size as an example, this reduces the number of pixels requiring edge detection from 64 -> 36.
Merge the VRS texture generation to be part of tonemapping
Our first iterations had the edge detection run as a standalone compute shader at the stop of mail service processing. Yet, at 4K resolution this introduced a bandwidth bottleneck due to the need to read in the whole scene color buffer. We moved the VRS texture generation to exist function of our tonemapping shader, the last shader in our mail processing, removing a roundtrip of memory for the color buffer.
Running the VRS texture generation on the Async Compute Queue.
Since VRS texture generation is run every bit the concluding stride of mail processing, anything in the next frame leading upwardly to the offset laissez passer that uses the VRS texture (the base of operations pass in our example) is a possible candidate for async compute overlap. We had already done piece of work to move the post processing chain to overlap with the side by side frame’s depth pass and so this allowed an easy optimization with minimal changes.
VRS for different rendering passes
This is a list of passes that we applied VRS to:
Base Pass: Renders all opaque meshes
Screen Space Ambience Apoplexy: Use screen-space information to judge areas that should receive less low-cal due to occlusion.
Lighting: Summate lighting for all light sources on the visible opaque meshes.
Screen Infinite Global Illumination: Use screen-space information to calculate bounced lighting.
Screen Infinite Reflections: Use screen-space data for creating reflections.
SSR Temporal AA: Anti-aliasing for the results of the Screen Space Reflection pass.
Translucency: Renders all translucent meshes.
Many of the above match the same passes nosotros applied VRS to in Gears Tactics with our Tier 1 implementation, only one interesting pass to call out is Translucency. Tier 1 VRS applied to Translucency caused artifacts besides astringent to utilise VRS due to the reliance on translucency for some UI furnishings. However, with the extra command enabled in Tier two, nosotros were able to bring back VRS to Translucency. Tier 2 VRS ensured UI elements in the translucency pass maintain their crispness while particles, like dust are expert candidates for fibroid shading.
Screen Space Global Illumination (SSGI) is a unique apply-case for VRS because it is done via a compute shader. VRS is a rasterization feature so it cannot be natively applied to compute shaders. Instead we were able to emulate VRS behavior in a compute shader because the VRS texture can exist read from as an SRV. Screen Space Global Illumination is a costly GPU pass and global illumination results tends to take well to existence composited at lower resolutions so applying VRS seemed like a skillful fit. The VRS emulation works by eliminating threads in a threadgroup based on the shading rate. The remaining threadgroups then aggrandize their coverage to fill in for the terminated. One caveat was that SSGI requires a denoiser pass, and VRS tin amplify the dissonance since it effectively reduces the amount of samples beingness taken. To handle this, we feed the VRS texture into the denoiser which uses the shading rate to aid weight the final blur.
Output of SSGI:
Working with Dynamic Resolution Scaling
Gears v/Tactics leverage Dynamic Resolution Scaling to ensure information technology hits a smooth lx FPS. If nosotros detect we’re nearly over upkeep, Dynamic Resolution Scaling kicks in and renders the adjacent frame at a lower resolution to ensure a frame isn’t dropped. Nosotros also leveraged Unreal Engine’southward temporal upscaling to run mail service processing at full resolution–even if Dynamic Resolution is downscaling, which keeps a high-quality final image. Still, this causes a trouble since the VRS texture generation is run at total resolution but then could potentially demand to be applied at a lower resolution. To resolve this, we ran a compute shader that rescaled the VRS texture to correct for dynamic resolution. Because the VRS texture is significantly smaller than the full resolution buffer, the GPU price of this rescale ended up being very fast (0.02ms on Xbox Series Ten|S).
Variable Rate Shading and Dynamic Resolution Scaling are both powerful techniques with different strengths and weaknesses. Dynamic Resolution Scaling allows a scaling of resolution in the grade of a percentage that can exist dialed up and downward at a pixel level to ensure the targeted frame rate is maintained while keeping the GPU fully utilized. The weakness, however, is that scaling downwards resolution must be done on the entire return target resulting in a global reduction of resolution. Tier 2 Variable Rate Shading is a complete flip of Dynamic resolution. Reduction in resolution is discretely controlled via the minor scattering of immune shading rates, but in substitution is flexible in what parts of the render target are affected.
We found our approach allowed the states to play to the strengths of both Dynamic Resolution Scaling
Variable Rate Shading. VRS takes a outset stab at applying fibroid shading based on the edge detection results. Next frame, Dynamic Resolution Scaling looks at the total GPU frame fourth dimension with the VRS savings being factored in and adjusts the scaling if needed. Equally an case, VRS applied to the real-time cinematics on the Xbox Series X allowed for dynamic resolution to run an average of 10% higher, and in the all-time cases, removed the need for any downscaling altogether.
For PC we immune VRS to be tuned with iii different video settings:
matches what is used by default on Xbox Series X|South and targets no perceptual impact.
similarly targets a minimal amount of perceptual difference, just under scrutiny may show some differences in favor of extra performance.
is an ambitious use of VRS that makes some visible compromises but gets dorsum the most performance.
The previous postal service on VRS performance on Gears Tactics focused on performance on NVidia Turing hardware, so this fourth dimension we will be looking at latest AMD’southward RDNA2 cards instead. Notwithstanding, we’d like to note that we saw similar performance scaling beyond both AMD and NVidia hardware.
The below results were taken on an AMD 6900 XT at 4K resolution with all graphics settings set to
|Frametime (ms)||Savings (ms)||Savings (%)|
To push the AMD 6900 XT further, we ran another test at 4K resolution with all settings set to
and with Screen Space Global Illumination on:
|Frametime (ms)||Savings (ms)||Savings (%)|
Comparing of shading charge per unit usage in Quality vs Balanced vs Performance
|Rendering Pass||Total Price (ms)||Quality Savings (ms)||Balanced Savings (ms)||Performance Savings (ms)|
|Screen Infinite Ambience Occlusion||2.13||0.94||i||i.17|
|Screen Space Global Illumination*||3||–||–||0.64|
|Screen Infinite Reflections||ii.67||one.27||1.27||ane.49|
Is it worth implementing Tier 2 VRS for my game?
Every engine is different and not all games will benefit equally from VRS. There are two things to go along in mind when evaluating VRS:
- VRS is an optimization that reduces the amount of pixel shader invocations. As such, it will but see comeback on games that are GPU bound due to pixel shader work.
- Tier 2 VRS sees higher performance gains when running at higher resolutions. While bodily results will vary based on engine and content, nosotros constitute that resolutions of 1080p or lower saw generally saw diminishing returns from Tier 2 VRS.
One of the perks of the VRS API the ease of integration. By using Tier one VRS and calculation RSSetShading to the offset of all command lists to ready the shading charge per unit to ii×2, you can quickly get a sense of the upper bound of the performance gain from VRS. We recommend taking 30-fifty% of the savings as an judge of what you’d await to get dorsum from a proper Tier 2 implementation. It’s besides important to look only at the savings of individual passes rather than the whole frame time, ignoring passes that Tier 2 VRS might non utilize to. For example, our Tier 2 VRS texture couldn’t be used with a shadow pass since it’due south generated from the indicate of view of the player photographic camera, not the light.
While nosotros were able to implement VRS for all the passes that gave us the biggest blindside for the buck, it was not plumbed into the entire engine due to time constraints. A deeper integration would permit VRS to provide even larger GPU savings.
“Software-Based Variable Rate Shading in Call of Duty” presented at SIGGRAPH 2020 (http://advances.realtimerendering.com/s2020/index.htm) has some interesting thoughts on this topic equally well. They present a method leveraging how console hardware handles MSAA to emulate VRS on platforms without hardware VRS support and actress flexibility such as smaller tile sizes. In addition, they present an optimized way to utilize VRS to compute shaders that uses ExecuteDispatchIndirect to ensure only waves with bodily piece of work are dispatched in contrast to our brute force method. Nonetheless, Software-Based VRS also has some trade-offs including implementation complexity and the overhead of a de-blocking pass. One possibility is to apply a hybrid of both techniques, switching between VRS techniques based on the characteristics of the rendering pass.
Tier 2 VRS allows for a free boost in performance with minimal visual impact. As we run across more than adoption of 120+ FPS and higher fidelity effects, it’s become increasingly important that we spend our GPU budget in all the right places, making Tier 2 VRS a welcome tool to help tackle the next generation of rendering.