- Summary
- Detailed Design
- Command signature Creation
- Drawing
- Bundles
- State leakage
- Obtaining buffer virtual addresses
- Feature tiers
- Implementation Details
- GPU Validation
- Test Plan
- Change log
This document describes D3D12 features needed to allow applications to generate command buffers on the GPU.
Some game developers see significant performance advantages by moving scene-traversal and culling onto the GPU. This is hard to do with the D3D API because D3D requires command buffers to be generated by the CPU. This proposal contains additions to D3D12 which would allow a limited degree of GPU-based command buffer generation.
A new API object is added to D3D12, the command signature. This object enables applications to specify:
-
The indirect argument buffer format
-
The command type that will be used (DrawInstanced, DrawIndexedInstanced, Dispatch)
-
The set of resource bindings which will change per-command call versus the set which will be inherited
At startup, an application would create a small set of command signatures. At runtime, the application would fill a buffer with commands (via whatever means that application chooses). The application would then use D3D12 command list APIs to set state (render target bindings, PSO, etc), and then use a command list API to cause the GPU to interpret the contents of the indirect argument buffer according to the format defined by a particular command signature.
For example, suppose an application wants a unique root constant to be specified per-draw call in the indirect argument buffer. The application would create a command signature that enables the indirect argument buffer to specify the following parameters per draw call:
-
Draw arguments (Vertex Count, Instance Count, ...)
-
The value of 1 root constant
The indirect argument buffer generated by the application would contain an array of fixed-size records. Each structure corresponds to 1 draw call. Each structure contains the drawing arguments, and the value of the root constant. The number of draw calls is specified in a separate GPU-visible buffer.
An example command buffer generated by the application would look like:
Command Buffer Format | |
---|---|
RootConstant (RootParameterIndex=1) |
Draw structure #1 |
VertexCount |
|
InstanceCount |
|
StartVertexLocation |
|
StartInstanceLocation |
|
RootConstant (RootParameterIndex=1) |
Draw structure #2 |
VertexCount |
|
InstanceCount |
|
StartVertexLocation |
|
StartInstanceLocation |
|
RootConstant (RootParameterIndex=1) |
Draw structure #3 |
VertexCount |
|
InstanceCount |
|
StartVertexLocation |
|
StartInstanceLocation |
The following structures define how particular arguments appear in an indirect argument buffer. These structures do not appear in any D3D12 API. Applications use these definitions when writing to an indirect argument buffer (with the CPU or GPU)
typedef struct D3D12_DRAW_ARGUMENTS
{
UINT VertexCountPerInstance;
UINT InstanceCount;
UINT StartVertexLocation;
UINT StartInstanceLocation;
} D3D12_DRAW_ARGUMENTS;
typedef struct D3D12_DRAW_INDEXED_ARGUMENTS
{
UINT IndexCountPerInstance;
UINT InstanceCount;
UINT StartIndexLocation;
INT BaseVertexLocation;
UINT StartInstanceLocation;
} D3D12_DRAW_INDEXED_ARGUMENTS;
typedef struct D3D12_DISPATCH_ARGUMENTS
{
UINT ThreadGroupCountX;
UINT ThreadGroupCountY;
UINT ThreadGroupCountZ;
} D3D12_DISPATCH_ARGUMENTS;
typedef struct D3D12_VERTEX_BUFFER_VIEW
{
D3D12_GPU_VIRTUAL_ADDRESS BufferLocation;
UINT SizeInBytes;
UINT StrideInBytes;
} D3D12_VERTEX_BUFFER_VIEW;
typedef struct D3D12_INDEX_BUFFER_VIEW
{
D3D12_GPU_VIRTUAL_ADDRESS BufferLocation;
UINT SizeInBytes;
DXGI_FORMAT Format;
} D3D12_INDEX_BUFFER_VIEW;
typedef struct D3D12_CONSTANT_BUFFER_VIEW
{
D3D12_GPU_VIRTUAL_ADDRESS BufferLocation;
UINT SizeInBytes;
UINT Padding;
} D3D12_CONSTANT_BUFFER_VIEW;
typedef struct D3D12_DISPATCH_RAYS_DESC
{
D3D12_GPU_VIRTUAL_ADDRESS_RANGE RayGenerationShaderRecord;
D3D12_GPU_VIRTUAL_ADDRESS_RANGE_AND_STRIDE MissShaderTable;
D3D12_GPU_VIRTUAL_ADDRESS_RANGE_AND_STRIDE HitGroupTable;
D3D12_GPU_VIRTUAL_ADDRESS_RANGE_AND_STRIDE CallableShaderTable;
UINT Width;
UINT Height;
UINT Depth;
} D3D12_DISPATCH_RAYS_DESC;
Applications use the following API to create a command signature.
typedef enum D3D12_INDIRECT_ARGUMENT_TYPE
{
D3D12_INDIRECT_ARGUMENT_TYPE_DRAW,
D3D12_INDIRECT_ARGUMENT_TYPE_DRAW_INDEXED,
D3D12_INDIRECT_ARGUMENT_TYPE_DISPATCH,
D3D12_INDIRECT_ARGUMENT_TYPE_VERTEX_BUFFER_VIEW,
D3D12_INDIRECT_ARGUMENT_TYPE_INDEX_BUFFER_VIEW,
D3D12_INDIRECT_ARGUMENT_TYPE_CONSTANT,
D3D12_INDIRECT_ARGUMENT_TYPE_CONSTANT_BUFFER_VIEW,
D3D12_INDIRECT_ARGUMENT_TYPE_SHADER_RESOURCE_VIEW,
D3D12_INDIRECT_ARGUMENT_TYPE_UNORDERED_ACCESS_VIEW,
D3D12_INDIRECT_ARGUMENT_TYPE_DISPATCH_RAYS,
D3D12_INDIRECT_ARGUMENT_TYPE_DISPATCH_MESH,
D3D12_INDIRECT_ARGUMENT_TYPE_INCREMENTING_CONSTANT
} D3D12_INDIRECT_ARGUMENT_TYPE;
typedef struct D3D12_INDIRECT_ARGUMENT_DESC
{
D3D12_INDIRECT_ARGUMENT_TYPE Type;
union
{
struct
{
UINT Slot;
} VertexBuffer;
struct
{
UINT RootParameterIndex;
UINT DestOffsetIn32BitValues;
UINT Num32BitValuesToSet;
} Constant;
struct
{
UINT RootParameterIndex;
} ConstantBufferView;
struct
{
UINT RootParameterIndex;
} ShaderResourceView;
struct
{
UINT RootParameterIndex;
} UnorderedAccessView;
// Tier 1.1 support
struct
{
UINT RootParameterIndex;
UINT DestOffsetIn32BitValues;
} IncrementingConstant;
};
} D3D12_INDIRECT_ARGUMENT_DESC;
typedef struct D3D12_COMMAND_SIGNATURE_DESC
{
// The number of bytes between each drawing structure
UINT ByteStride;
UINT NumArgumentDescs;
[annotation("_Field_size_full_(NumArgumentDescs)")] const D3D12_INDIRECT_ARGUMENT_DESC* pArgumentDescs;
UINT NodeMask;
} D3D12_COMMAND_SIGNATURE_DESC;
HRESULT ID3D12Device::CreateCommandSignature(
const D3D12_COMMAND_SIGNATURE* pDesc,
ID3D12RootSignature* pRootSignature,
REFIID riid, // Expected: ID3D12CommandSignature
void** ppCommandSignature
);
The ordering of arguments within an indirect argument buffer is defined to exactly match the order of arguments specified in D3D12_COMMAND_SIGNATURE::pArgumentDescs. All of the arguments for 1 draw/dispatch call within an indirect argument buffer are tightly packed. However, applications are allowed to specify an arbitrary byte stride between draw/dispatch commands in an indirect argument buffer.
D3D12_INDIRECT_ARGUMENT_TYPE_INCREMENTING_CONSTANT
is a unique argument type in that it doesn't occupy any argument buffer space. See Incrementing constant.
The root signature must be specified if and only if the command signature changes one of the root arguments.
For root SRV/UAV/CBV, the application specified size in in bytes. The debug layer will validate the following restrictions on the sizes and address:
-
CBV -- Address and size must be a multiple of 256 bytes
-
Raw UAV -- Address and size must be a multiple of 4 bytes
-
Typed UAV -- Address and size must be a multiple of the UAV format size
-
Structured UAV -- Address and size must be a multiple of the structure byte stride (declared in the shader)
-
SRV - Address and size must be a multiple of the SRV format size
A given command signature is either a draw or a compute command signature. If a command signature contains a drawing operation, then it is a graphics command signature. Otherwise, the command signature must contain a dispatch operation, and it is a compute command signature.
Graphics command signatures only affect graphics root arguments. Likewise, compute command signatures only affect compute root arguments.
Feature tier 1.1 adds support for D3D12_INDIRECT_ARGUMENT_TYPE_INCREMENTING_CONSTANT
.
This is a unique argument type in that it doesn't occupy any argument buffer space.
Instead, D3D12_INDIRECT_ARGUMENT_TYPE_INCREMENTING_CONSTANT
makes the system update a specified root constant for each executed command. The configuration of the update is in the IncrementingConstant
union entry of D3D12_INDIRECT_ARGUMENT_DESC
(see Command signature creation):
struct
{
UINT RootParameterIndex; // Must match with a root constant in the command signature's root signature
UINT DestOffsetIn32BitValues; // Which constant in case the root constant has multiple
} IncrementingConstant;
From the shader point of view, accessing this is simply a matter of accessing the corresponding root constant.
The counter is a 32-bit UINT. For a given ExecuteIndirect
call, the value starts at 0
for the first command and increments by 1
for each subsequent command.
A command signature can contain at most one incrementing constant.
After an ExecuteIndirect()
invocation completes, any root constant targeted by an incrementing constant argument is reset to 0
, consistent with the overall rules defined in State leakage. As stated in that section, runtime accomplishes this by calling the driver to set the root constant to 0
.
In this example, the indirect argument buffer generated by the application holds an array of 36-byte structures. Each structure only contains the 5 parameters passed to DrawIndexedInstanced (plus padding).
The code to create the command signature description is:
D3D12_INDIRECT_ARGUMENT_TYPE Args[1];
Args[0].Type = D3D12_INDIRECT_ARGUMENT_TYPE_DRAW_INDEXED_INSTANCED;
D3D12_COMMAND_SIGNATURE ProgramDesc;
ProgramDesc.ByteStride = 36;
ProgramDesc.ArgumentCount = 1;
ProgramDesc.pArgumentDescs = Args;
The layout of a single structure within an indirect argument buffer is:
Bytes 0:3 | IndexCountPerInstance |
Bytes 4:7 | InstanceCount |
Bytes 8:11 | StartIndexLocation |
Bytes 12:15 | BaseVertexLocation |
Bytes 16:19 | StartInstanceLocation |
Bytes 20:35 | Padding |
In this example, each structure in an indirect argument buffer changes 2 root constants, changes 1 vertex buffer binding, and performs 1 drawing non-indexed operation. There is no padding between structures.
The code to create the command signature description is:
D3D12_INDIRECT_ARGUMENT_TYPE Args[4];
Args[0].Type = D3D12_INDIRECT_ARGUMENT_TYPE_CONSTANT;
Args[0].Constant.RootParameterIndex = 2;
Args[1].Type = D3D12_INDIRECT_ARGUMENT_TYPE_CONSTANT;
Args[1].Constant.RootParameterIndex = 6;
Args[2].Type = D3D12_INDIRECT_ARGUMENT_TYPE_VERTEX_BUFFER;
Args[2].VertexBuffer.VBSlot = 3;
Args[3].Type = D3D12_INDIRECT_ARGUMENT_TYPE_DRAW_INSTANCED;
D3D12_COMMAND_SIGNATURE ProgramDesc;
ProgramDesc.ByteStride = 40;
ProgramDesc.ArgumentCount = 4;
ProgramDesc.pArgumentDescs = Args;
The layout of a single structure within the indirect argument buffer is:
Bytes 0:3 | Data for root parameter index 2 |
Bytes 4:7 | Data for root parameter index 6 |
Bytes 8:15 | Virtual address of VB (64-bit) |
Bytes 16:19 | VB stride |
Bytes 20:23 | VB size |
Bytes 24:27 | VertexCountPerInstance |
Bytes 28:31 | InstanceCount |
Bytes 32:35 | StartVertexLocation |
Bytes 36:39 | StartInstanceLocation |
The runtime will validate the following:
-
There is exactly 1 entry defining the draw/dispatch parameters (either D3D12_INDIRECT_ARGUMENT_TYPE_DRAW_INSTANCED or D3D12_INDIRECT_ARGUMENT_TYPE_DRAW_INDEXED_INSTANCED or D3D12_INDIRECT_ARGUMENT_TYPE_DISPATCH). This entry must come last.
-
A D3D12_INDIRECT_ARGUMENT_TYPE_INDEX_BUFFER entry can only be present if there is also a D3D12_INDIRECT_ARGUMENT_TYPE_DRAW_INDEXED_INSTANCED present.
-
ByteStride is 4-byte aligned
-
ByteStride is large enough to hold all data
-
All resource bindings are compatible with the root signature.
-
Each root parameter slot is defined no more than once
-
Root parameter slots do not exceed range defined by root signature
-
If there are index buffer changes, the index buffer format is valid
-
If there are vertex buffer changes, the vb slot index is within the range allowed by D3D
-
Entries that reference root parameter slots are sorted from smallest to largest root parameter index
-
Root constant entries are sorted from smallest to largest DestOffsetIn32BitValues (including no overlap)
-
If D3D12_INDIRECT_ARGUMENT_TYPE_DISPATCH is used, then VB and IB bindings cannot be changed
-
The root signature is specified if and only if the command signature indicates that 1 of the root arguments changes
-
For root constants DestOffsetIn32BitValues + Num32BitValuesToSet is within range defined by the root signature
-
No VB slot is changed more than once
-
Only 1 IB change per command signature is allowed
-
Num32BitValuesToSet must be greater than 0
-
ByteStride must be a multiple of 4 bytes
Applications perform indirect draws/dispatches via the following API:
void ID3D12CommandList::ExecuteIndirect(
ID3D12CommandSignature* pCommandSignature,
UINT MaxCommandCount,
ID3D12Resource* pArgumentBuffer,
UINT64 ArgumentBufferOffset,
ID3D12Resource* pCountBuffer,
UINT64 CountBufferOffset
);
There are 2 ways that command counts can be specified:
If pCountBuffer is not NULL, then MaxCommandCount specifies the maximum number of operations which will be performed. The actual number of operations to be performed are defined by the minimum of this value, and a 32-bit unsigned integer contained in pCountBuffer (at the byte offset specified by CountBufferOffset).
If pCounterBuffer is NULL, the MaxCommandCount specifies the exact number of operations which will be performed.
The semantics of this API are defined with the following pseudo-code:
Non-NULL pCountBuffer:
// Read draw count out of count buffer
UINT CommandCount = pCountBuffer->ReadUINT32(CountBufferOffset);
CommandCount = min(CommandCount, MaxCommandCount)
// Get pointer to first Commanding argument
BYTE* Arguments = pArgumentBuffer->GetBase() + ArgumentBufferOffset;
for(UINT CommandIndex = 0; CommandIndex < CommandCount; CommandIndex++)
{
// Interpret the data contained in *Arguments
// according to the command signature
pCommandSignature->Interpret(Arguments);
Arguments += pCommandSignature ->GetByteStride();
}
NULL pCountBuffer:
// Get pointer to first Commanding argument
BYTE* Arguments = pArgumentBuffer->GetBase() + ArgumentBufferOffset;
for(UINT CommandIndex = 0; CommandIndex < MaxCommandCount;CommandIndex++)
{
// Interpret the data contained in *Arguments
// according to the command signature
pCommandSignature->Interpret(Arguments);
Arguments += pCommandSignature ->GetByteStride();
}
The debug layer will issue an error if either the count buffer or the argument buffer are not in the D3D12_RESOURCE_USAGE_INDIRECT_ARGUMENT state.
The core runtime will validate:
-
CountBufferOffset and ArgumentBufferOffset are 4-byte aligned
-
pCountBuffer and pArgumentBuffer are buffer resources (any heap type)
-
The offset implied by MaxCommandCount, ArgumentBufferOffset, and the drawing program stride do not exceed the bounds of pArgumentBuffer (similarly for count buffer)
-
The command list is a direct command list or a compute command list (not a copy or JPEG decode command list)
The debug layer will validate:
- The root signature of the command list matches the root signature of the command signature
ID3D12CommandList::DrawInstancedIndirect and ID3D12CommandList::DrawIndexedInstancedIndirect are removed from the D3D12 API because they can be implemented with the features described here.
ID3D12CommandList::ExecuteIndirect is allowed inside of bundle command lists only if all of the following are true:
-
CountBuffer is NULL (CPU-specified count only)
-
The command signature contains exactly 1 operation. This implies that the command signature does not contain root arguments changes, nor contain VB/IB binding changes.
ExecuteIndirect is defined to reset all bindings affected by the ExecuteIndirect to known values. In particular.
-
If the command signature binds a VB to a particular slot, then after ExecuteIndirect is called, a NULL VB is bound to that slot
-
If the command signature binds an IB, then after ExecuteIndirect, a NULL IB is bound.
-
If the command signature sets a root constant, then after ExecuteIndirect is called, the root constant value is set to 0
-
If the command signature sets a root view (CBV/SRV/UAV), then after ExecuteIndirect is called, the root view is set to a NULL view.
This enables drivers to easily track bindings. This is implemented by the D3D12 runtime by making a series of DDI calls after the ExecuteIndirect is called.
A new API is added whereby an application can retrieve the GPU virtual address of a buffer.
typedef UINT64 D3D12_GPU_VIRTUAL_ADDRESS;
D3D12_GPU_VIRTUAL_ADDRESS ID3D12Resource::GetGPUVirtualAddress();
Applications are free to apply byte offsets to virtual addresses before placing them in an indirect argument buffer. Note that all of the D3D12 alignment requirements for VB/IB/CB still apply to the resulting GPU virtual address.
This API returns 0's for non-buffer resources.
typedef enum D3D12_EXECUTE_INDIRECT_TIER
{
D3D12_EXECUTE_INDIRECT_TIER_1_0 = 10,
D3D12_EXECUTE_INDIRECT_TIER_1_1 = 11,
} D3D12_EXECUTE_INDIRECT_TIER;
All D3D12 devices support tier 1.0, which is the majority of this spec, and applications don't need to bother checking for this level of support.
Tier 1.1 adds support for Incrementing constant. To use this, applications do need to ensure device support for tier 1.1, reported in D3D12_FEATURE_D3D12_OPTIONS21
, via CheckFeatureSupport()
.
Both of the following implementations are acceptable:
-
Make the GPU command processor interpret the indirect argument buffer contents in the application-defined format
-
Allocate enough storage in the current command buffer to hold MaxCommandCount draws. Execute a compute shader to transform from the application-specified format to a hardware-specific format in the allocated command buffer space.
Note that in the 2^nd^ approach, the hidden compute shader invocations associated with many ExecuteIndirect calls can be combined together. If a command list has no resource transition barrier to the D3D12_RESOURCE_USAGE_INDIRECT_ARGUMENT state, then it is safe to move all of the hidden compute shader invocations to the beginning of the command list.
In order to achieve consistent behavior across machines, GPUs are expected to perform the following validation:
- The draw count specified in the indirect argument buffer is guaranteed to not exceed the MaxCommandCount specified in the ExecuteIndirect API call. This is achieved by having the GPU compute min(MaxCommandCount, CommandCount).
-
Validation during command signature creation √
-
Validation during ExecuteIndirect √
-
GetGPUVirtualAddress returns 0 for non-buffer resources √
-
Debug layer validation of buffer states √
-
Debug layer validation of buffer contents
-
The runtime sets state to correct default values after ExecuteIndirect (also the runtime should correctly select compute vs graphics root arguments to change) √
-
Debug layer validation of buffer alignments (taking into account the SRV/UAV format/structure byte stride).
-
Indirect argument structures have no padding √
-
ComandListAPITestBase::Execute is updated to call ExecuteIndirect √
-
CommandListTest::InvalidBundleAPI √
-
CommandListTest::SetCommandListError √
-
CommandSignature object takes a reference on root signature object √
-
Debug layer warning if a command signature is destroyed while there is outstanding work queued against it (like a PSO)
-
Runtime validation in GetGPUVA √
-
GetGPUVA works with placed resources √
-
GetGPUVA works with reserved resources √
-
CreateCommandSignature correctly handles the case where the first parameter changes a root argument (validation of increasing orders handles the first case). √
-
Debug layer validation during ExecuteIndirect
-
Debug layer validation of index buffer formats
-
Debug layer validation of tiled constant buffers not being supported
-
The runtime hard-coded array size of 64 constants is the correct size √
-
Drivers work correctly when CountBuffer is NULL and when CountBuffer is non-NULL
-
ExecuteIndirect works correctly inside of a bundle
-
Applications can add byte offsets to GPU virtual addresses
-
Arbitrary command signatures produce the same results as corresponding non-indirect rendering API calls
-
Arbitrary 4-byte aligned byte strides are supported
-
Indirect argument buffer & Count buffer can be in any heap type
-
GPU computes min(MaxCommandCount, CommandCount)
-
Drivers do not elide hidden shaders across ResourceBarrier(->INDIRECT_ARGS) state.
-
Predication works correctly
-
Queries work correctly
-
GetGPUVA works with committed resources, placed resources, and reserved resources
-
GetGPUVA works with opened shared resources
-
CountBufferOffset & ArgumentBufferOffset are interpreted correctly
-
When setting root constants, DestOffsetIn32BitValues and Num32BitValuesToSet are interpreted correctly
-
Changing root arguments works with both compute and graphics
-
Drivers correctly handle shader-visibility and deny flags when setting root arguments via ExecuteIndirect
-
Out-of-bounds behavior correctly applies based on the app-specified buffer sizes (for CBV/SRV/UAV)
-
ExecuteIndirect works on compute queues
-
MaxCommandCount==0 works correctly
-
Root SRVs and UAVs work with all supported formats
-
Tiled root SRVs and UAVs work correctly (with offsets)
Date | Changes |
---|---|
10/12/2023 |
|