This section outlines the changes you need to apply in order to upgrade from GT 2.6 and prior to GT 2.7 and later.
Coming with GT 2.7, the internal graphics library, Renoir, has been updated with optimizations and capabilities to balance the load between the CPU and GPU. The fine balancing act gives the best results on mobile architectures.
There are several changes in the graphics backend that must be adapted from previous versions in order for the new version to work.
The backend now provides a new capability - PreferCPUWorkload
, which executes more work on the CPU instead of the GPU. This is beneficial on systems with fast CPUs, or if the application is already GPU-bound. The current recommendation is to set this option to false on PC/Consoles and true on mobile devices.
The CPU workload preference allows for batching graphics commands of different type (e.g. draw image/draw text/etc.), while otherwise only the same type of commands can be grouped together. This requires some new shaders that are labeled as ST_BatchedXXX
.
An orthogonal optimization valid for both preferences (CPU/GPU workload) is splitting the standard shader (which included code handling many different types of graphics commands) into 2 parts - frequently used (e.g. images, rectangles, text) and rarely used (e.g. ellipses, YUV to RGB conversion, filter transform).
Here’s an overview of the changes needed to make your backend work:
StandardVertex
structure now uses a float4
instead of float3
for the Position attributePreferCPUWorkload
option in the backend, you’ll need to implement the the batched version of the shaders as wellHere are the steps for migrating:
Float3
, Float4
, Float4
Float4
, Float4
, Float4
Backend::FillCaps
function must set the PreferCPUWorkload
optionBackend::FillCaps
function must map the shaders according to the new schemeST_Standard
, ST_StandardTexture
, ST_Text
, ST_TextSDF
are still mapped to ST_Standard
ST_Standard
, are now mapped to ST_StandardRare
ST_Stencil
, ST_StencilTexture
are still mapped to ST_Stencil
ST_StencilRRect
, ST_StencilCircle
are now mapped to ST_StencilRare
ST_StandardRare
and ST_StencilRare
to themselvesST_BatchedStandard
(new VS + PS)ST_StandardRare
(can use the ST_Standard
VS, new PS)ST_StencilRare
(can use the ST_Standard
VS, new PS)RendererBackend::CreatePipelineState
callST_BatchedStandard
, ST_StencilRare
ST_Text
, it is no longer used (it’s now part of ST_Standard
)float4
instead of float3
for the Position attributeST_Standard
shader is now used for ShaderTypes 0 (draw rect), 3 (draw texture), 17 (raster text), 18 (SDF text)ST_StandardRare
handles the rest of the ShaderTypes that were previously handled by the ST_Standard
one.ST_BatchedXXX
pixel shaders are almost the same as the ST_XXX
variants, except that the ShaderType is no longer a uniform parameter, but a varying one instead (coming from the PS input data)ST_BatchedXXX
vertex shader must set the varying param for the pixel shader that contains the ShaderType. It originally comes from the vertex shader input, from the Additional.w attributeST_StencilXXX
shaders are split similar to ST_Standard
- one handling the 0,3,17,18 types, and another all the othersThis section outlines the changes you need to apply in order to upgrade from GT versions 2.8.1.0 and prior to 2.8.2.0.
The current backend API was designed around DirectX 11 and therefore implementing a DirectX 11 backend with the current API is simple and intuitive. However, with the advent of modern low-level APIs (e.g. Dx12, Metal and Vulkan) and concepts such as render passes, enhancements of the backend API were needed, so that the backend implementation for those graphics APIs is easier and more efficient.
You can read more about the motivation for the backend API changes here.
In GT 2.8.2.0, Renoir's backend API has been modified significantly with the introduction of 2 new commands, 4 new Renoir Core capabilities and one new shader type. There are also several changes in the graphics backend that must be adapted from previous versions in order for the new version to work.
If you prefer to make minimal changes to you backend and not use the new capabilities, here are the steps for migrating:
FillCaps
method outCaps.ShouldUseRenderPasses = false;outCaps.ShouldClearRTWithClearQuad = false;outCaps.ConstantBufferBlocksCount = 1;outCaps.ConstantBufferRingSize = 1;...outCaps.ShaderMapping[ST_ClearQuad] = ST_ClearQuad;
unsigned size
as last argument of the CreateConstantBuffer
method and use it for the constant buffer allocation size. This is how the CreateConstantBuffer
method declaration in the backend header should look like: Example of using the size parameter on constant buffer creation from the DirectX11 backend:virtual bool CreateConstantBuffer(CBType type, ConstantBufferObject object, unsigned size) override;
bool Dx11Backend::CreateConstantBuffer(CBType, ConstantBufferObject object, unsigned size){D3D11_BUFFER_DESC bufferDesc;bufferDesc.ByteWidth = size;...// Create the constant buffer with the filled buffer descriptionm_Device->CreateBuffer(&bufferDesc, ..)}
SetRenderTarget
stop using EnableColorWrites
flag as it is no longer present and instead handle the PipelineState's
ColorMask
field in CreatePipelineState
. This field currently only supports the values ColorWriteMask:CWM_None
and ColorWriteMask::CWM_All
, which correspond to the previous false and true values of EnableColorWrites
. Set the appropriate value of the graphics API's render target color write mask. E.g. in DirectX11 the write mask is placed in the description of the blend state: bool Dx11Backend::CreatePipelineState(const PipelineState& state, PipelineStateObject object){...D3D11_BLEND_DESC desc;desc.RenderTarget[0].RenderTargetWriteMask = UINT8(state.ColorMask);...// Create the blend state with the filled descriptionm_Device->CreateBlendState(&desc, ...)}
ExecuteRendering
add empty cases with only break in them for BC_BeginRenderPass
and BC_EndRenderPass
: case BC_BeginRenderPass:{break;}break;case BC_EndRenderPass:{break;}break;
The MSAASamples
field was added to the PipelineState
structure, so you may start using it in CreatePipelineState
.
Below we will describe each new capability, how it can affect you backend and what are the needed changes you need to make to use it.
When the ShouldUseRenderPasses
capability is enabled, then Renoir starts enqueuing the commands BeginRenderPass
and EndRenderPass
and stops issuing the SetRenderTarget
and ResolveRenderTarget
commands. The BeginRenderPass
command provides all the needed information for starting a render pass in modern graphics APIs like Metal and Vulkan. This information includes the render targets, whether they should be cleared on render pass load and if they should be resolved on store. Here are the additional steps you need to make to start using this capability:
ShouldUseRenderPasses
to true in the FillCaps
methodSetRenderTarget
and ResolveRenderTarget
methods and add an assert that they are never calledBeginRenderPass
method, which handles the corresponding command by using the provided information by it to begin a render pass in the graphics APIEndRenderPass
method, which handles the corresponding command by ending the current render pass and possibly also resetting any currently kept state of the render pass. E.g. in our Metal backend the implementation of the EndRenderPass
method is the following: [m_State->CurrentCmdEncoder endEncoding];m_State->CurrentCmdEncoder = nil;m_State->BoundGPUState = GPUState();
Enabling the ShouldClearRTWithClearQuad
capability will make Renoir issue fullscreen clear quad instead of calling ClearRenderTarget
. The clear quad is done through a new vertex and pixel shader. The capability was added so that we don't need to create a new render pass to clear a render target in graphics APIs like Metal, which do not provide an easier way to do it. Here are the additional steps you need to make to start using this capability:
ShouldClearRTWithClearQuad
to true in the FillCaps
methodClearRenderTarget
method and add an assert that it is never calledST_ClearQuad
vertex and pixel shader, compile them if necessary and start using them. You can check out the example ST_ClearQuad
HLSL shaders provided with the DirectX11 backend. The Metal backend is using the clear quad capability, so you can check out how to use the new shaders in its implementation.The ConstantBufferRingSize
capability allows you to set the size of the internal ring buffer, which is used to manage Renoir's constant buffers. We recommend to set this size to 4 for low-level graphics APIs like Dx12, Metal and Vulkan. The motivation for this particular size is that the maximum count of buffered frames in a standard pipeline is three and in order to surely avoid overlap of constant buffers, they should be managed by a circular buffer with size 4. If you have a pipeline with higher maximum count of buffered frames, then this value should be changed accordingly. For most high-level graphics APIs ring buffer size should be set to 1, because the drivers for them handle constant buffer overlap internally and therefore a greater value for the ring buffer size is unnecessary.
The only steps you need to make to start using this capability are:
ConstantBufferRingSize
to the appropriate value in the FillCaps
methodThe ConstantBufferBlocksCount
capability allow you to set the count of aligned constant buffer blocks for each constant buffer type. Renoir will issue a CreateConstantBuffer
call with size equal to (constant buffer blocks count) * (aligned specific constant buffer size) for each constant buffer type. If the blocks count value is greater than 1, then if the regular constant buffer becomes full, Renoir will make sure that a new auxiliary constant buffer is allocated. If the blocks count value is equal to 1, then Renoir won't create any auxiliary constant buffers. Auxiliary constant buffers are allocated per frame, thus being allocated before ExecuteRendering
is called and deallocated immediately after that. Setting constant buffer ring size and blocks count value to greater than 1 usually goes hand in hand, because both provide functionality that otherwise should be explicitly implemented in the backend for low-level graphics APIs like Dx12, Metal and Vulkan. For other APIs that don't support constant buffers, but use uniform slots (e.g OpenGL) both capabilities should be set to 1 in order to avoid unnecessary constant buffer creation.
The only steps you need to make to start using this capability are:
ConstantBufferBlocksCount
to the appropriate value in the FillCaps
methodGT version 2.8.10.0 introduces support for using images that do not have their alpha channel premultiplied into the other color channels. Prior to version 1.8, all images in the SDK were treated as if they were using premultiplied alpha, disregarding any image metadata that might tell otherwise.
User images (images that are preloaded by the engine, instead of decoded internally by GT) can now specify whether their alpha channel is premultiplied via the new Coherent::UIGT::ResourceResponseUIGT::UserImageData::AlphaPremultiplication
property. You can set the property to the correct value in the UserImageData
object that is passed to the Coherent::UIGT::ResourceResponseUIGT::ReceiveUserImage
API in the Coherent::UIGT::ResourceHandler::OnResourceRead
callback of your resource handler. This allows you to re-use the same image in both your engine and UI even if the engine uses a non-premultiplied alpha pipeline.
Following is a table that describes the differences and solutions for various image formats:
Format | 1.7 and prior | 1.8 |
---|---|---|
PNG, JPG, Other RGB(A) formats | Automatically premultiplied after decode | No change - Automatically premultiplied after decode |
DDS | Assumed to have premultiplied alpha | May need to re-save - Attempts to determine if alpha is premultiplied from metadata |
KTX, ASTC, PKM | Assumed to have premultiplied alpha | Must re-save - Assumed NOT to have premultiplied alpha |
User images | Assumed to have premultiplied alpha | User controlled via Coherent::UIGT::ResourceResponseUIGT::UserImageData::AlphaPremultiplication |
Note that the output of all operations in the GT is still an alpha-premultiplied texture so blending of the resulting UI texture is not affected.
renoir::RendererCaps
structure.There are 2 new fields in the renoir::RendererCaps
structure: renoir::RendererCaps::MaxTextureWidth
and renoir::RendererCaps::MaxTextureHeight
. The default versions of all backends are updated to fill in these values, which represent the maximum possible width and height of a 2D texture for the device, respectively. These values are used to limit the size of temporary textures that the Renoir library creates. A good default is 8192x8192px, which is what was used internally before exposing these options.
renoir::RendererCaps::MaxTextureWidth
and renoir::RendererCaps::MaxTextureHeight
fields is required, since otherwise these limits will end up with uninitialized values and lead to undefined behavior.renoir::DrawCustomEffectCmd
has been changed to provide more information.Previously, the structure had the renoir::float2
fields TextureCoordinates
and TextureDimensions
, which represented the placement of the drawn effect in the currently set render target. These values are now passed through the renoir::float4
field renoir::DrawCustomEffectCmd::TargetGeometryPositionSize
.