Relaxing Render Target Clear operation in Stage3D

Authored by Jing Chen

Prior to Flash Player 15.0 and AIR 15.0, there was a strict restriction that the clear function must be called on a render target before drawing, so that the content in a render texture does not get preserved after invoking the clear function. In Flash Player 15.0 and AIR 15.0, we have optimized the old internal rendering system and have removed this requirement.

In this blog post, we’ll show the benefits of this feature.

Avoid invoking clear unnecessarily

By invoking the clear function, we erase all render targets (color, depth, and stencil buffers) with the value passed to the clear function. Thus, the clear function becomes redundant if you know that the previous content in the target buffers is meaningless and will be overwritten with a full screen size quad. The clear operation is especially expensive when using with multiple rendering targets or floating-point texture formats. Sometimes, avoiding it will make your code more efficient.

Here is a simple test depicting time reduction by omitting the clear function. We simply draw some content to four textures with RGBA_HALF_FLOAT format, as shown in the following figure:

Figure 1. Draw to target textures

Figure 1. Draw to target textures

When you use the clear function, the GPU duration is 24.875 ms, and without clear, it’s 16.574 ms. Looking into the time spend on each phase, we can see that clearing the target textures (the highlighted line in the screenshot shown below) is really time consuming (more than 30% of the GPU time). The data is gathered using Intel Graphics Performance Analyzers.

Figure 2. Clear might be time consuming

Figure 2. Clear might be time consuming

Figure 3. GPU time without clear

Figure 3. GPU time without clear

As the time spent on clear relates to the target textures’ size and format, complex scenarios with multiple and high precision rendering targets will benefit more when the clear operation is omitted.

Although we introduced it as a choice to improve performance by not clearing the buffer, this may not be the case on some mobile GPUs.  On tile-based architecture, which is used in several popular mobile GPUs, avoiding clear operation means repeatedly copying the previous buffer data back, which can be disastrous.

So, be cautious when you are developing cross platform applications. The only way to truly determine the performance impact on a specific system is to test it.

If you want to read more about performance improvement on tile-based architecture, check out the article, Performance Tuning for Tile-Based Architectures.

Reusing depth and stencil buffer

Before we removed this limitation, it was hard to reuse the depth and stencil buffer after changing the rendering targets. Depth and stencil buffer were not dropped if no clear function was invoked afterwards. Also, only if the render target texture dimensions match the previous target texture exactly, the two target textures will share the same depth and stencil buffer.

The following figure depicts a deferred shading demo, highlighting a scene before this feature was introduced. The colorful light points were added after we render all the other objects (character, ground) in the scene (with light effects to a texture which was cleared), so the depth information couldn’t be retrieved, and hence the red lights were rendered on top of the character even though the character was opaque.

Figure 4. Incorrect depth buffer

Figure 4. Incorrect depth buffer

After we introduced this enhancement, the depth information is reserved when switching the render target to that texture. So, we can reuse the depth information to do z-testing when drawing the light points. Now, the points are culled correctly. What’s more, we can even draw a skybox based on the depth information.

Figure 5. Reuse depth buffer

Figure 5. Reuse depth buffer

Tips and limitations

  1. The enhancements works for all profiles including software rendering modes. It also works on desktop and mobile devices, (including iOS and Android). However, you must be careful when using it on mobile devices.
  2. Relaxing clear in this feature is only applicable when rendering to target texture. You will still get errors if clear is not called before drawing to the back buffer.
  3. You must call the clear function or initialize the target texture (by uploading some contents) before you draw for the first time.
  4. To make sure the rendering result is as expected, it’s better to set the parameter optimizeForRenderToTexture to true when you create a texture, which is likely to be used as a render target. This is highly recommended. Flash Player will not guarantee to preserve contents across draw calls on some of the platforms if the parameter is set to false.
  5. Be careful when you are using anti-aliasing on target texture.
    Anti-aliasing on target texture has been enabled since Flash Player 13.0 and AIR 13.0. In general, Flash Player draws directly to the texture. However, if the anti-aliasing level is not set to 0, Flash Player will draw to another temporary color buffer to do anti-aliasing, and copy the result back to the texture. The limitation is that, if clear was not called, to preserve the previous draw results, we need to copy it back from texture to the temporary buffer and then do anti-aliasing on it. This might be expensive and cause performance hit.
  6. The depth and stencil buffer of the rendering targets will not be shared with the back buffer, so you are free to set the parameter enableDepthAndStencil to false when calling configureBackBuffer.

Comments are closed.