Introducing Vulkan support in the Coherent Technology
7/26/2019
Dimitar Trendafilov
In this blog post, we will share the challenges we faced during the implementation of the Vulkan backend for our rendering library Renoir and how we overcame them. In order to better understand the exciting challenges we faced, we will start by laying out the main requirements for Vulkan support.
- It must implement the Renoir backend API, which is used for all backends and their corresponding graphics APIs. Keep in mind that our backend API is designed according to the properties of our software products, namely a game UI SDKs. This results in Renoir not knowing what resources (e.g. textures) will be used in the next frame as the user of our SDK has the ability to load and change them dynamically (e.g. through JavaScript). This problem is elaborated further later in the blog post.
- It must use our shader cross-compilation pipeline and therefore our base common HLSL shaders. We avoid significantly changing these shaders for a specific graphics API as this defeats the purpose of having one set of shaders for all APIs.
- It must run on all the Vulkan supported platforms and adhere to the hardware-specific limits of all the possible Desktop and Mobile devices our products support.
- It should be easy to understand and high performance at the same time. This is caused by the fact that the Vulkan backend implementation we provide is just a sample one and our clients must have the ability to modify it to suit their specific engine. In most cases, they don’t even need to modify the sample as the one we provide matches their requirements.
- It also should provide an option for custom memory allocations while having a default implementation.
The Vulkan API also introduces some requirements and limitations, which influenced our backend design decisions.
- Vulkan gives full control of the memory management to the developer.
‣ You have to manually query the memory requirements and determine the memory type for each allocation. This is not always easy as it heavily depends on the user’s graphics card. ‣ You should not use separate allocation for each resource as you cannot have more than 4096 simultaneous memory allocations even for modern graphics cards. This means that you have to implement a custom allocator, which splits a couple of huge allocations into smaller chunks.
This is why we decided to create our own allocator interface and wrap the Vulkan Memory Allocator by default, but also enable clients to implement and provide custom allocators.
- Buffers and Images (Textures) in Vulkan are provided to the shaders through descriptors and these descriptors are organized and aggregated in descriptor sets. You can’t bind individual descriptors, but only one or multiple descriptor sets.
‣ Sets are allocated from descriptor pools by providing them a corresponding descriptor set layout. ‣ The descriptor set layout describes how the data is laid out in the shader by specifying each descriptor binding. ‣ You can’t bind more than 4 descriptor sets for a significant amount of GPUs.
- The descriptor sets are updated through the vkUpdateDescriptorSets API method, which isn’t recorded in the command buffer with the other Vulkan commands but executes immediately. This can lead to an invalid command buffer if the descriptor set, which is updated is either in use by a command buffer executing on the GPU or in the recording state.
- It is preferred to group your descriptors by binding frequencies. E.g. if you have multiple uniform buffers and textures in a descriptor set and only one of them changes, then you will have to pay the cost for rebinding all of them.
The design of descriptor sets is very convenient and high performance when you have information upfront of what resources will be used for a given scene or at least for a large number of frames. However, as mentioned in our requirements, we don’t have that information and therefore we have to update and bind descriptor sets multiple times per frame, which incurs a performance cost.In the Vulkan backend, we separate our descriptors into four types of sets - vertex and pixel shader uniform buffers, samplers and textures. Each type of descriptor set has a corresponding descriptor set manager. This manager allocates a chunk of descriptor sets each frame for every type and allows descriptor set reuse for sets updated with the same data.
We also have a circular buffer of descriptor pools with default size 4 (can be modified based on the number of buffered frames in a user’s pipeline). Every frame we reset the current descriptor pool and pass it to the managers, so they can allocate their descriptor set chunks. The descriptor pool has a maximum amount of descriptor sets it can allocate for each frame and this value can also be modified for heavier scenes. Additionally, we use dynamic uniform buffers in order to keep our UB updates to a minimum by just updating whole uniform buffer once and providing different dynamic offsets in it on the binding.
This whole scheme is acceptable for us, as we don’t change the samplers and uniform buffers that often. We can optimize our texture descriptor set management by using array of textures and a push constant to index in the array. This strategy requires adding a preprocessing step in the Vulkan backend, which based on the Renoir commands collects all the information for the resources that are going to be used for a given frame and updates the texture array. However, this will both require changing our common HLSL shaders to include the array of textures and the push constant only for the Vulkan backend and will complicate our sample backend implementation, which as mentioned in our requirements is not desirable.
Most of the other problems we hit, we have already solved similar ones during the implementation of backends for other graphics APIs (DirectX12, Metal, etc.). However, there were still a couple of Vulkan specific tricks we had to use and problems we stumbled on:
-
We were forced to use the VK_KHR_MAINTENANCE1 device extension because we need to specify negative viewport height in order to y-flip the clip-space to framebuffer coordinate transform. This is needed for Vulkan clip coordinates as they have different orientation compared to the D3D clip coordinates (motivation for this behavior).
-
Due to the nature of Vulkan image layouts, the backend expects every user texture or render target, which is passed to it by the user’s application or engine to be in the VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL layout. After the backend enqueues all of its commands in the user-provided command buffer, then the user textures and render targets are again transitioned to the VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL layout.
-
It is usually hard to determine the right time to destroy a framebuffer. The reason for this is that you have to destroy it when all of its attachments are no longer used by the GPU and destroyed. Otherwise, validation layers errors are hit. Most other rendering frameworks (e.g. bgfx, The Forge) destroy them at the end of a frame.
-
We had a bug in one specific blur effect test, which happened only on Nvidia graphics cards and was caused by using texture, which is not initially cleared immediately after creation. We set it for a render pass color attachment with a clear load operation and regular store operation. Then we used it to achieve a blur effect and when we ended the render pass, random artifacts were present in the texture’s blur. The synchronization of the image was correct. Additionally, RenderDoc was no help, because it always showed correct rendering with no artifacts. All of this was not present on AMD graphics cards and everything worked fine.
-
There were many Nvidia and AMD quirks as the one described in the previous point. We are kind of used to that, but there were a larger amount of such quirks we hit for the Vulkan backend than any other. For example, at one point of our implementation, we didn’t synchronize the writing to our vertex, index, and uniform buffers. There was no barrier for a buffer in the whole backend. The Vulkan validation layers weren’t warning about that. As a matter of fact, the rendering was working 99% of the time. On AMD and Nvidia graphics cards the failing tests were different and again the artifact was not present in the RenderDoc replay, because it serializes the commands and it usually is unable to catch synchronization artifacts. Due to the rendering almost always working, we considered that the problem was something small like one missing image barrier or one wrong render pass action, not general buffer synchronization, so it took some time to fix that issue.
-
We tried to find very extensive validation layers to try to get ideas for some of the problems we hit. We were unable to find such layers, but there is already some work on that.
The Vulkan backend is included for the native Windows SDK in Gameface and Prysm 1.3 and will be included in the native Android SDK in Gameface and Prysm 1.4. Vulkan support for our UE4 plugin is also added in the 1.4 release version.