. Coherent Labs

Incoming Renoir backend API changes

SOFTWARE DEVELOPMENT

Dimitar Trendafilov

In this blog post we will describe the incoming changes to Renoir’s backend API and the motivation for them.


The current backend API was designed around DirectX 11 and therefore implementing a DirectX 11 backend with the current API is simple and intuitive. The mapping between backend and DirectX 11 API methods is very straightforward. Also Renoir’s backend API proved to be suitable for implementing backends for other graphics APIs like DirectX9 and OpenGL.


As the industry evolved, API vendors began shifting to lower-level graphics APIs, which provided more fine grained control over the GPU. As a result of that shift AMD announced their new graphics API Mantle and Microsoft followed with DirectX 12. The goals of both graphics APIs were similar – reducing CPU overhead and allowing direct access to all GPU features. The development of AMD’s Mantle API was eventually suspended due to the rise of popularity and the similar aims of DirectX 12 and OpenGL’s successor Vulkan.


Once DirectX 12 was officially released, we naturally started implementing a Renoir backend for it and while we encountered some minor problems with the original backend API design, we managed to solve them and deliver a DirectX 12 backend to our products.


Apple decided to create their own low-level, low-overhead graphics API called Metal. A couple of years later they announced the deprecation of OpenGL and OpenGLES respectively for Mac and iOS. We decided that it would be best to get ready for the next decision of Apple to fully deprecate something and implement a Metal backend.


Metal is designed around the principles of the modern low-level compute and graphics APIs like Vulkan and DirectX 12. It is specifically tailored to bring better performance than OpenGL to both Mac and iOS applications. A key concept in both Metal and Vulkan is the render pass. The definition provided by Apple for a render pass is a collection of commands that updates a set of render targets. Metal also introduces a render command encoder object, which is simply defined as The object to use for encoding commands for a render pass. In Metal we need to create a new render command encoder object for each logical render pass. The render pass concept is introduced in order to achieve high performance on GPUs with tiled architecture and to keep the code and hardware tied together without much abstraction in-between. The information that you must specify in order to create a render command encoder in Metal includes the color, depth and stencil attachments that you are going to draw in, if they should be cleared on load and whether MSAA texture resolve should happen at the end of the render pass when storing the render targets.


When we started designing and implementing the Metal backend, we encountered several problems, which showed us that Renoir’s backend API is not fully suited to handle a graphics API designed around the render pass concept. The information needed for creating a render command encoder was scattered in backend API. Some methods provided only part of the information and some required the creation of a new encoder although there may already exist one. Additionally in Metal you can’t have more than one active render command encoder. These specifics of the graphics and backend API were the cause of multiple problems while implementing the Metal backend. The major ones were:


  • The backend API uses two separate commands for binding and clearing render targets (SetRenderTarget and ClearRenderTarget). However, as mentioned above when creating a render command encoder we should specify if some attachment should be cleared on load. This is very efficient on tiled GPUs as starting a render pass is tied together with the clearing of a texture. Additionally there is no other method in the Metal API, which can directly be used to clear a render target.
  • The ClearRenderTarget command is often issued to just clear the depth and stencil attachments in the middle of a render pass, which means that we will often do the expensive operation of resetting the current render command encoder and therefore beginning a new render pass with different load actions for the depth and stencil attachments.
  • When MSAA textures are enabled, Renoir frequently needs to resolve them and issues to the backend the command ResolveRenderTarget with the proper MSAA and resolve texture. Resolve can be issued in the middle of a render pass, which again means that we will need to create a new render command encoder just to resolve the textures and then reset the old one. This happens due to the Metal API only allowing render target resolve implicitly at the end of a render pass and to do so, an appropriate store action should be set on render pass creation.


We managed to workaround most problems in the Metal backend by basically saving the render pass state, ending the current render pass and creating a new one for the currently issued operation. In the case of the ResolveRenderTarget problem we even had to reset the old render pass after the end of the method.


These workarounds incur a CPU and GPU performance cost, which we minimized in order to provide a fully functional Metal Backend in Gameface 1.1 and Prysm 1.1. However, in order to be able to take full advantage of the Metal graphics API and avoid the workarounds, we decided to make some changes to the backend API and rework the Metal backend in Gameface 1.2 and Prysm 1.2.


These changes include:


  • Two new backend API commands – BeginRenderPass and EndRenderPass. They are issued only when a new backend capability is enabled in the FillCaps method. This capability tells Renoir to issue these new commands instead of the SetRenderTarget and ResolveRenderTarget commands. The parameters in the BeginRenderPass command provide all the information needed to create a render command encoder – the render targets, if they should be cleared on load and if resolve should happen at the end of the render pass. This change allows for efficient clearing of the render targets and also solves the problem with the resolve in the middle of a render pass.
  • Another new capability, which when enabled tells Renoir to replace the ClearRenderTarget command with a DrawIndexed command, which does a fullscreen clear quad through the use of a new ClearQuad vertex and pixel shader. This change solves the problem of needing to reset the render command encoder in order to clear the render targets.


We also decided to further simplify the implementation of all the modern low-level graphics APIs by doing the following changes:


  • One of the problems observed during the implementation of the DirectX12 backend was overwriting constant buffers. This was happening, because for a given frame we were binding a constant buffer and sending the commands to the GPU and then for the next frame we were again writing to this constant buffer while it was still being used by the GPU. The problem was also present in other backends like Metal. We solved it easily by adding a ring buffer containing multiple constant buffers. However, this problem was forcing Renoir’s clients to write the ring buffer code in multiple backends, so in order to simplify the backend’s implementation we decided to move the circular buffering scheme to the Renoir rendering library and exposed the size of the ring buffer as a configurable capability in the backend.
    When this size is set to the maximum count of buffered frames in your pipeline, updating the constant buffers’ data consists of simply finding each constant buffer object and placing the new data in it. The SDK takes care not to overwrite any buffers currently used by the GPU.

  • We decided to move the information whether a render target is MSAA to the CreatePipelineState command, because it is part of DirectX 12, Metal and Vulkan pipeline state objects.
  • The EnableColorWrites flag is removed from the SetRenderTarget command and the ColorMask field of the Pipeline object is used instead in the CreatePipelineState backend method.


These changes are synchronized with all the provided current sample backends, but will require modifications in the custom ones. You can further understand the details of the required changes from the transition guide, which will be added soon to the documentation of Gameface and Prysm. We are confident these changes of Renoir’s backend API will allow you to achieve simpler and at the same time more performant backend implementations, so expect them in Gameface 1.2 and Prysm 1.2.


The render passes and connections between them for the Moba UI before and after the backend API changes. Xcode’s GPU Frame Capture tool was used to capture this screenshot on a Mac. Each block of 3 render targets (black squares in most cases) is a separate render pass. Notice that the number render passes has been reduced 3 times (from 9 to 3).