Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPIR-V] Compiling with doubles may also implicitly enable Int64 capability (which may or may not be supported) #7038

Closed
Nielsbishere opened this issue Dec 16, 2024 · 13 comments · Fixed by #7073
Assignees
Labels
bug Bug, regression, crash spirv Work related to SPIR-V

Comments

@Nielsbishere
Copy link
Contributor

Description
When loading and adding a 64-bit float from a BAB it produces 64-bit int instructions as well, even though not used in the source.

Steps to Reproduce

typedef uint U32;
typedef float F32;

typedef double3 F64x3;
typedef float3 F32x3;

typedef uint4 U32x4;

static const U32 ResourceId_mask = (1 << 17) - 1;

struct TransformPreciseDouble {		//Stride 8, Length 24
	F64x3 pos;
};

struct TransformImprecise {			//Stride 4, Length 12
	F32x3 pos;		//Relative to the camera origin
};

enum EResourceBinding {
	EResourceBinding_ObjectTransformsLocal,
	EResourceBinding_ObjectTransformsGlobal,
    EResourceBinding_ObjectTransformsLocalRW
};

#ifdef __spirv__
    #define _binding(a, b, ...) [[vk::binding(a, b)]] __VA_ARGS__
	#define _vkBinding(a, b) [[vk::binding(a, b)]]
#else
    #define _binding(a, b, ...) __VA_ARGS__ : register(space##a)
	#define _vkBinding(a, b)
#endif

_vkBinding( 0, 2) cbuffer globals {	//Globals used during the entire frame for useful information such as frame id.

	U32 _frameId;					//Can loop back to 0 after U32_MAX!
	F32 _time;						//Time since launch of app
	F32 _deltaTime;					//deltaTime since last frame.
	U32 _swapchainCount;			//How many swapchains are present (will insert ids into appData)

	U32x4 _swapchains[8];			//Descriptors of swapchains: (Read, write)[2][8]

	//Up to 368 bytes of user data, useful for supplying constant per frame data.
	//Make sure to offset to make sure.

	U32x4 _appData[23];
};

U32 getAppData1u(U32 offset) { return offset >= 92 ? 0 : _appData[offset >> 2][offset & 3]; }

#define rwBufferUniform(i) _rwBuffer[i & ResourceId_mask]
#define bufferUniform(i) _buffer[i & ResourceId_mask]

_binding( 3, 1, ByteAddressBuffer _buffer[131072]);
_binding( 4, 1, RWByteAddressBuffer _rwBuffer[131072]);

template<typename T>
void setAtUniform(U32 resourceId, U32 id, T t) {
	rwBufferUniform(resourceId).Store<T>(id, t);
}

template<typename T>
T getAtUniform(U32 resourceId, U32 id) {
	return bufferUniform(resourceId).Load<T>(id);
}

U32 bufferBytesUniform(U32 resourceId) {
	U32 bytes;
	bufferUniform(resourceId).GetDimensions(bytes);
	return bytes;
}

[[oxc::stage("compute")]]
[[oxc::extension("I64", "F64")]]	//Loading F64 might incurr I64 instructions in spirv
[numthreads(256, 1, 1)]
void main(U32 i : SV_DispatchThreadID) {

	U32 objectTransformGlobal = getAppData1u(EResourceBinding_ObjectTransformsGlobal);

	U32 bytes = bufferBytesUniform(objectTransformGlobal);

	TransformImprecise result;

	{

		U32 elems = bytes / sizeof(TransformPreciseDouble);

		if(i + 1 >= elems)
			return;
			
		TransformPreciseDouble cam = getAtUniform<TransformPreciseDouble>(objectTransformGlobal, 0);
		TransformPreciseDouble obj = getAtUniform<TransformPreciseDouble>(objectTransformGlobal, i + 1);
		result.pos = (float3)(obj.pos - cam.pos);
	}

	U32 objectTransformLocal = getAppData1u(EResourceBinding_ObjectTransformsLocalRW);
	setAtUniform(objectTransformLocal, i + 1, result);
}

Turns into roughly:

Section has 20486 bytes.
 Showing offset #0 with size 20486
 File contents: (ascii)
 ; SPIR-V
; Version: 1.5
; Generator: Google spiregg; 0
; Bound: 138
; Schema: 0
               OpCapability Shader                  ; 0x00000014
               OpCapability Float64                 ; 0x0000001c
               OpCapability Int64                   ; 0x00000024
               OpMemoryModel Logical GLSL450        ; 0x0000002c
               OpEntryPoint GLCompute %1 "main" %gl_GlobalInvocationID %3 %4 %5     ; 0x00000038
               OpExecutionMode %1 LocalSize 256 1 1                                 ; 0x0000005c

               ; Annotations
               OpDecorate %gl_GlobalInvocationID BuiltIn GlobalInvocationId         ; 0x00000074
               OpDecorate %3 DescriptorSet 1                                        ; 0x00000084
               OpDecorate %3 Binding 3                                              ; 0x00000094
               OpDecorate %4 DescriptorSet 1                                        ; 0x000000a4
               OpDecorate %4 Binding 4                                              ; 0x000000b4
               OpDecorate %5 DescriptorSet 2                                        ; 0x000000c4
               OpDecorate %5 Binding 0                                              ; 0x000000d4
               OpDecorate %_runtimearr_uint ArrayStride 4                           ; 0x000000e4
               OpMemberDecorate %_struct_6 0 Offset 0                               ; 0x000000f4
               OpMemberDecorate %_struct_6 0 NonWritable                            ; 0x00000108
               OpDecorate %_struct_6 Block                                          ; 0x00000118
               OpMemberDecorate %_struct_7 0 Offset 0                               ; 0x00000124
               OpDecorate %_struct_7 Block                                          ; 0x00000138
               OpDecorate %_arr_v4uint_uint_8 ArrayStride 16                        ; 0x00000144
               OpDecorate %_arr_v4uint_uint_23 ArrayStride 16                       ; 0x00000154
               OpMemberDecorate %_struct_8 0 Offset 0                               ; 0x00000164
               OpMemberDecorate %_struct_8 1 Offset 4                               ; 0x00000178
               OpMemberDecorate %_struct_8 2 Offset 8                               ; 0x0000018c
               OpMemberDecorate %_struct_8 3 Offset 12                              ; 0x000001a0
               OpMemberDecorate %_struct_8 4 Offset 16                              ; 0x000001b4
               OpMemberDecorate %_struct_8 5 Offset 144                             ; 0x000001c8
               OpDecorate %_struct_8 Block                                          ; 0x000001dc

               ; Types, variables and constants
        %int = OpTypeInt 32 1                                                       ; 0x000001e8
      %int_5 = OpConstant %int 5                                                    ; 0x000001f8
      %float = OpTypeFloat 32                                                       ; 0x00000208
       %uint = OpTypeInt 32 0                                                       ; 0x00000214
%uint_131071 = OpConstant %uint 131071                                              ; 0x00000224
    %uint_24 = OpConstant %uint 24                                                  ; 0x00000234
     %uint_1 = OpConstant %uint 1                                                   ; 0x00000244
     %uint_0 = OpConstant %uint 0                                                   ; 0x00000254
     %uint_2 = OpConstant %uint 2                                                   ; 0x00000264
     %uint_3 = OpConstant %uint 3                                                   ; 0x00000274
     %uint_4 = OpConstant %uint 4                                                   ; 0x00000284
    %uint_32 = OpConstant %uint 32                                                  ; 0x00000294
%uint_131072 = OpConstant %uint 131072                                              ; 0x000002a4
%_runtimearr_uint = OpTypeRuntimeArray %uint                                        ; 0x000002b4, ArrayStride 4
  %_struct_6 = OpTypeStruct %_runtimearr_uint                                       ; 0x000002c0, Block
%_arr__struct_6_uint_131072 = OpTypeArray %_struct_6 %uint_131072                   ; 0x000002cc
%_ptr_StorageBuffer__arr__struct_6_uint_131072 = OpTypePointer StorageBuffer %_arr__struct_6_uint_131072    ; 0x000002dc
  %_struct_7 = OpTypeStruct %_runtimearr_uint                                                               ; 0x000002ec, Block
%_arr__struct_7_uint_131072 = OpTypeArray %_struct_7 %uint_131072                                           ; 0x000002f8
%_ptr_StorageBuffer__arr__struct_7_uint_131072 = OpTypePointer StorageBuffer %_arr__struct_7_uint_131072    ; 0x00000308
     %uint_8 = OpConstant %uint 8                                                                           ; 0x00000318
     %v4uint = OpTypeVector %uint 4                                                                         ; 0x00000328
%_arr_v4uint_uint_8 = OpTypeArray %v4uint %uint_8                                                           ; 0x00000338, ArrayStride 16
    %uint_23 = OpConstant %uint 23                                                                          ; 0x00000348
%_arr_v4uint_uint_23 = OpTypeArray %v4uint %uint_23                                                         ; 0x00000358, ArrayStride 16
  %_struct_8 = OpTypeStruct %uint %float %float %uint %_arr_v4uint_uint_8 %_arr_v4uint_uint_23              ; 0x00000368, Block
%_ptr_Uniform__struct_8 = OpTypePointer Uniform %_struct_8                                                  ; 0x00000388
     %v3uint = OpTypeVector %uint 3                                                                         ; 0x00000398
%_ptr_Input_v3uint = OpTypePointer Input %v3uint                                                            ; 0x000003a8
       %void = OpTypeVoid                                                                                   ; 0x000003b8
         %36 = OpTypeFunction %void                                                                         ; 0x000003c0
    %v3float = OpTypeVector %float 3                                                                        ; 0x000003cc
     %double = OpTypeFloat 64                                                                               ; 0x000003dc
   %v3double = OpTypeVector %double 3                                                                       ; 0x000003e8
       %bool = OpTypeBool                                                                                   ; 0x000003f8
%_ptr_Uniform_uint = OpTypePointer Uniform %uint                                                            ; 0x00000400
%_ptr_StorageBuffer__struct_6 = OpTypePointer StorageBuffer %_struct_6                                      ; 0x00000410
%_ptr_StorageBuffer_uint = OpTypePointer StorageBuffer %uint                                                ; 0x00000420
      %ulong = OpTypeInt 64 0                                                                               ; 0x00000430
          %3 = OpVariable %_ptr_StorageBuffer__arr__struct_6_uint_131072 StorageBuffer                      ; 0x00000440, DescriptorSet 1, Binding 3
          %4 = OpVariable %_ptr_StorageBuffer__arr__struct_7_uint_131072 StorageBuffer                      ; 0x00000450, DescriptorSet 1, Binding 4
          %5 = OpVariable %_ptr_Uniform__struct_8 Uniform                                                   ; 0x00000460, DescriptorSet 2, Binding 0
%gl_GlobalInvocationID = OpVariable %_ptr_Input_v3uint Input                                                ; 0x00000470, BuiltIn GlobalInvocationId
     %uint_5 = OpConstant %uint 5                                                                           ; 0x00000480


               ; Function 1
          %1 = OpFunction %void None %36                                                                    ; 0x00000490

         %46 = OpLabel                                                                                      ; 0x000004a4
         %47 =   OpLoad %v3uint %gl_GlobalInvocationID                                                      ; 0x000004ac
         %48 =   OpCompositeExtract %uint %47 0                                                             ; 0x000004bc
                 OpSelectionMerge %49 None                                                                  ; 0x000004d0
                 OpSwitch %uint_0 %50                                                                       ; 0x000004dc

         %50 =     OpLabel                                                                                  ; 0x000004e8
         %51 =       OpAccessChain %_ptr_Uniform_uint %5 %int_5 %uint_4 %uint_3                             ; 0x000004f0
         %52 =       OpLoad %uint %51                                                                       ; 0x0000050c
         %53 =       OpBitwiseAnd %uint %52 %uint_131071                                                    ; 0x0000051c
         %54 =       OpAccessChain %_ptr_StorageBuffer__struct_6 %3 %53                                     ; 0x00000530
         %55 =       OpArrayLength %uint %54 0                                                              ; 0x00000544
         %56 =       OpIMul %uint %55 %uint_4                                                               ; 0x00000558
         %57 =       OpUDiv %uint %56 %uint_24                                                              ; 0x0000056c
         %58 =       OpIAdd %uint %48 %uint_1                                                               ; 0x00000580
         %59 =       OpUGreaterThanEqual %bool %58 %57                                                      ; 0x00000594
                     OpSelectionMerge %60 None                                                              ; 0x000005a8
                     OpBranchConditional %59 %61 %60                                                        ; 0x000005b4

         %61 =         OpLabel                                                                              ; 0x000005c4
                         OpBranch %49                                                                       ; 0x000005cc

         %60 =     OpLabel                                                                                  ; 0x000005d4
         %62 =       OpAccessChain %_ptr_StorageBuffer_uint %3 %53 %uint_0 %uint_0                          ; 0x000005dc
         %63 =       OpLoad %uint %62                                                                       ; 0x000005f8
         %64 =       OpAccessChain %_ptr_StorageBuffer_uint %3 %53 %uint_0 %uint_1                          ; 0x00000608
         %65 =       OpLoad %uint %64                                                                       ; 0x00000624
         %66 =       OpUConvert %ulong %63                                                                  ; 0x00000634
         %67 =       OpUConvert %ulong %65                                                                  ; 0x00000644
         %68 =       OpShiftLeftLogical %ulong %67 %uint_32                                                 ; 0x00000654
         %69 =       OpBitwiseOr %ulong %66 %68                                                             ; 0x00000668
         %70 =       OpBitcast %double %69                                                                  ; 0x0000067c
         %71 =       OpAccessChain %_ptr_StorageBuffer_uint %3 %53 %uint_0 %uint_2                          ; 0x0000068c
         %72 =       OpLoad %uint %71                                                                       ; 0x000006a8
         %73 =       OpAccessChain %_ptr_StorageBuffer_uint %3 %53 %uint_0 %uint_3                          ; 0x000006b8
         %74 =       OpLoad %uint %73                                                                       ; 0x000006d4
         %75 =       OpUConvert %ulong %72                                                                  ; 0x000006e4
         %76 =       OpUConvert %ulong %74                                                                  ; 0x000006f4
         %77 =       OpShiftLeftLogical %ulong %76 %uint_32                                                 ; 0x00000704
         %78 =       OpBitwiseOr %ulong %75 %77                                                             ; 0x00000718
         %79 =       OpBitcast %double %78                                                                  ; 0x0000072c
         %80 =       OpAccessChain %_ptr_StorageBuffer_uint %3 %53 %uint_0 %uint_4                          ; 0x0000073c
         %81 =       OpLoad %uint %80                                                                       ; 0x00000758
         %82 =       OpAccessChain %_ptr_StorageBuffer_uint %3 %53 %uint_0 %uint_5                          ; 0x00000768
         %83 =       OpLoad %uint %82                                                                       ; 0x00000784
         %84 =       OpUConvert %ulong %81                                                                  ; 0x00000794
         %85 =       OpUConvert %ulong %83                                                                  ; 0x000007a4
         %86 =       OpShiftLeftLogical %ulong %85 %uint_32                                                 ; 0x000007b4
         %87 =       OpBitwiseOr %ulong %84 %86                                                             ; 0x000007c8
         %88 =       OpBitcast %double %87                                                                  ; 0x000007dc
         %89 =       OpCompositeConstruct %v3double %70 %79 %88                                             ; 0x000007ec
         %90 =       OpShiftRightLogical %uint %58 %uint_2                                                  ; 0x00000804
         %91 =       OpAccessChain %_ptr_StorageBuffer_uint %3 %53 %uint_0 %90                              ; 0x00000818
         %92 =       OpLoad %uint %91                                                                       ; 0x00000834
         %93 =       OpIAdd %uint %90 %uint_1                                                               ; 0x00000844
         %94 =       OpAccessChain %_ptr_StorageBuffer_uint %3 %53 %uint_0 %93                              ; 0x00000858
         %95 =       OpLoad %uint %94                                                                       ; 0x00000874
         %96 =       OpUConvert %ulong %92                                                                  ; 0x00000884
         %97 =       OpUConvert %ulong %95                                                                  ; 0x00000894
         %98 =       OpShiftLeftLogical %ulong %97 %uint_32                                                 ; 0x000008a4
         %99 =       OpBitwiseOr %ulong %96 %98                                                             ; 0x000008b8
        %100 =       OpBitcast %double %99                                                                  ; 0x000008cc
        %101 =       OpIAdd %uint %90 %uint_2                                                               ; 0x000008dc
        %102 =       OpAccessChain %_ptr_StorageBuffer_uint %3 %53 %uint_0 %101                             ; 0x000008f0
        %103 =       OpLoad %uint %102                                                                      ; 0x0000090c
        %104 =       OpIAdd %uint %90 %uint_3                                                               ; 0x0000091c
        %105 =       OpAccessChain %_ptr_StorageBuffer_uint %3 %53 %uint_0 %104                             ; 0x00000930
        %106 =       OpLoad %uint %105                                                                      ; 0x0000094c
        %107 =       OpUConvert %ulong %103                                                                 ; 0x0000095c
        %108 =       OpUConvert %ulong %106                                                                 ; 0x0000096c
        %109 =       OpShiftLeftLogical %ulong %108 %uint_32                                                ; 0x0000097c
        %110 =       OpBitwiseOr %ulong %107 %109                                                           ; 0x00000990
        %111 =       OpBitcast %double %110                                                                 ; 0x000009a4
        %112 =       OpIAdd %uint %90 %uint_4                                                               ; 0x000009b4
        %113 =       OpAccessChain %_ptr_StorageBuffer_uint %3 %53 %uint_0 %112                             ; 0x000009c8
        %114 =       OpLoad %uint %113                                                                      ; 0x000009e4
        %115 =       OpIAdd %uint %90 %uint_5                                                               ; 0x000009f4
        %116 =       OpAccessChain %_ptr_StorageBuffer_uint %3 %53 %uint_0 %115                             ; 0x00000a08
        %117 =       OpLoad %uint %116                                                                      ; 0x00000a24
        %118 =       OpUConvert %ulong %114                                                                 ; 0x00000a34
        %119 =       OpUConvert %ulong %117                                                                 ; 0x00000a44
        %120 =       OpShiftLeftLogical %ulong %119 %uint_32                                                ; 0x00000a54
        %121 =       OpBitwiseOr %ulong %118 %120                                                           ; 0x00000a68
        %122 =       OpBitcast %double %121                                                                 ; 0x00000a7c
        %123 =       OpCompositeConstruct %v3double %100 %111 %122                                          ; 0x00000a8c
        %124 =       OpFSub %v3double %123 %89                                                              ; 0x00000aa4
        %125 =       OpFConvert %v3float %124                                                               ; 0x00000ab8
        %126 =       OpAccessChain %_ptr_Uniform_uint %5 %int_5 %uint_5 %uint_0                             ; 0x00000ac8
        %127 =       OpLoad %uint %126                                                                      ; 0x00000ae4
        %128 =       OpBitwiseAnd %uint %127 %uint_131071                                                   ; 0x00000af4
        %129 =       OpCompositeExtract %float %125 0                                                       ; 0x00000b08
        %130 =       OpCompositeExtract %float %125 1                                                       ; 0x00000b1c
        %131 =       OpCompositeExtract %float %125 2                                                       ; 0x00000b30
        %132 =       OpAccessChain %_ptr_StorageBuffer_uint %4 %128 %uint_0 %90                             ; 0x00000b44
        %133 =       OpBitcast %uint %129                                                                   ; 0x00000b60
                     OpStore %132 %133                                                                      ; 0x00000b70
        %134 =       OpAccessChain %_ptr_StorageBuffer_uint %4 %128 %uint_0 %93                             ; 0x00000b7c
        %135 =       OpBitcast %uint %130                                                                   ; 0x00000b98
                     OpStore %134 %135                                                                      ; 0x00000ba8
        %136 =       OpAccessChain %_ptr_StorageBuffer_uint %4 %128 %uint_0 %101                            ; 0x00000bb4
        %137 =       OpBitcast %uint %131                                                                   ; 0x00000bd0
                     OpStore %136 %137                                                                      ; 0x00000be0
                     OpBranch %49                                                                           ; 0x00000bec

         %49 = OpLabel                                                                                      ; 0x00000bf4
                 OpReturn                                                                                   ; 0x00000bfc
               OpFunctionEnd                                                                                ; 0x00000c00

E.g. %66 = OpUConvert %ulong %63 ; 0x00000634 is emitted before working on the double, requiring the capability.

Actual Behavior
Just like DXIL don't start enabling capabilities that aren't used by the shader (DXIL only enables the 64-bit float extension rather than both).
From the D3D12 database I can't find a device that doesn't have both I64 and F64, but the vulkan database is offline so I'm not sure if any such device exists for Vulkan (for example some mobile device).

Environment

  • DXC version: Ahead of latest release or the version in shader playground (2024-04-29)
  • Host Operating System: N/A
@Nielsbishere Nielsbishere added bug Bug, regression, crash needs-triage Awaiting triage spirv Work related to SPIR-V labels Dec 16, 2024
@damyanp damyanp moved this to For Google in HLSL Triage Dec 17, 2024
@devshgraphicsprogramming

You will find missing F64 before you find missing I64.

vulkan 1.3 onwards BDA is encouraged, and DXC doesn't even emit the uvec2 variant of BDA, so one might say that if one targets VK 1.3 int64 is very strongly implied.

@Keenuts
Copy link
Collaborator

Keenuts commented Jan 2, 2025

Hi!

Shorter repro:

// RUN: %dxc -E main -T cs_6_5 -spirv %s | FileCheck %s

ByteAddressBuffer input;
RWByteAddressBuffer output;

[numthreads(1, 1, 1)]
void main() {
  output.Store<double>(0, input.Load<double>(0));
}

This indeed emits the Int64 capability, but that's because of the way ByteAddressBuffer are loaded/stored to:

From MSDN:
https://learn.microsoft.com/en-us/windows/win32/direct3d11/direct3d-11-advanced-stages-cs-resources#byte-address-buffer

ByteAddressBuffer content are supposed to be unsigned integers, and then the data can be interpreted as something else.

This means to handle a 32-bit float, we need to:

  • load 32 bits at an offset aligned on 32 bit
  • bitcast this integer into a float
  • do the operation
  • cast to a uint32
  • store in the RWByteAddressBuffer.

To handle 64bit floats, we need to:

  • load 2 32-bit integers
  • store the MSB, shift right, store the LSB
  • cast the 64-bit int to a double

SPIR-V doesn't allow shift operations on floats, hence using an integer is a requirement to recompose a double from 2 integers. For this reason, we have to declare the Int64 capability when reading/writting a Float64 from the ByteAddressBuffer.

The optimization we COULD possible have is to prevent the cast to a float when not required. Ex: if the loaded float/double is not modified before being stored back into another buffer.
In such case, we could simply copy the raw 32 bits integers.

Now, do we want to implement this now? I don't think so.
As long as there is no device/need to handle this corner case, I'd say we can safely close this as WAI.

@Keenuts Keenuts closed this as completed Jan 2, 2025
@github-project-automation github-project-automation bot moved this from For Google to Triaged in HLSL Triage Jan 2, 2025
@Keenuts Keenuts removed the needs-triage Awaiting triage label Jan 2, 2025
@Nielsbishere
Copy link
Contributor Author

Fair enough, will deal with it the normal way in the compiler then. F64 will hard require I64.

@Keenuts
Copy link
Collaborator

Keenuts commented Jan 2, 2025

From Core3 docs, seems like you only target Vulkan1.2+ .
From VK spec, https://docs.vulkan.org/spec/latest/appendices/versions.html#versions-1.2
1.2 made VK_KHR_shader_atomic_int64 core. This extensions declares the Int64Atomics capability, which implies Int64. So looks like Int64 will be supported on all devices targeting Vulkan1.2.

@Nielsbishere
Copy link
Contributor Author

However, if Vulkan 1.2 is supported and this extension is not, the shaderBufferInt64Atomics capability is optional

I think sometimes they add an extension but the capability itself is disabled. But I'm going to check if there's any device that might have a problem with this. The vulkan database is probably back online again.

@Keenuts
Copy link
Collaborator

Keenuts commented Jan 2, 2025

Right, there is something weird, looks like you need to support/understand the extension, but you can say you don't support the feature..

 However, if Vulkan 1.2 is supported and this extension is not, the shaderBufferInt64Atomics capability is optional.

@Nielsbishere
Copy link
Contributor Author

I think it's fine honestly, it seems like there are only a few android devices that support F64 and they all support I64. Shouldn't cause any problems I think

@Nielsbishere
Copy link
Contributor Author

Out of curiosity, is there any overhead to the casts that are being done? Is it possible that there will be a native load replacement rather than uint -> bitcast in the future? Or is that just a spirv limitation

@Keenuts
Copy link
Collaborator

Keenuts commented Jan 2, 2025

Out of curiosity, is there any overhead to the casts that are being done? Is it possible that there will be a native load replacement rather than uint -> bitcast in the future? Or is that just a spirv limitation

No idea, it will all depend on the target ISA.
For example, AMD does seem to remove the bitcast+shift operations, and load store 2 32-bit word:

https://godbolt.org/z/az3rGa5a1

@devshgraphicsprogramming
Copy link

devshgraphicsprogramming commented Jan 6, 2025

SPIR-V doesn't allow shift operations on floats, hence using an integer is a requirement to recompose a double from 2 integers.

May I treat to you some Nabla-isms?
https://godbolt.org/z/4r4GYT9re
No Int64 capability emitted!

You can use OpBitcast between a uint2 and double (not sure if you need to swap the x and y components around due to endianness vs component layout)

SPIR-V spec allows bitcasts between native SPIR-V (scalars, vectors of scalars, etc.) as long as the two operands are the same size. It also allows for casts between uint64 and PhysicalStoragePointer but thats with the BDA capability.

P.S. there's a slight pita with SPIR-V OpBitcast, it doesn't support identity T==U cast, and once you get into pointers the whole thing gets fishy (the pointer type can only be to an integer or vector of them). Its not like a C++20 bitcast, which would be nice to get rid of custom unpack/pack of arbitrary structs to/from ByteAddressBuffer

@Keenuts
Copy link
Collaborator

Keenuts commented Jan 6, 2025

You can use OpBitcast between a uint2 and double (not sure if you need to swap the x and y components around due to endianness vs component layout)

Oh completely forgot that you can mismatch the element count as long as bitcount is fine! (modulus the rule around component number being a multiple of the other).
Thanks for pointing this out!

I quickly looked at the code, and looks like the classic asdouble is correctly implemented, and only raw buffer load/store is going through this int64.
I'll re-open this issue as the fix seems small.

@Keenuts Keenuts reopened this Jan 6, 2025
@Keenuts Keenuts self-assigned this Jan 6, 2025
@devshgraphicsprogramming

You could probably improve the SPIR-V codegen for ByteAddressBuffer a lot, because every struct is made of bool, intN_t, uintN_t, floatN_t, or a vector/matrix there of.

With this SPIR-V bitcast, it means that you can load every final PoD member into a vector/array of the appropriate number of uintN_t and then bitcast (I think the rule about pointers of integers is to allow bitcasting with arrays of integers).

@Nielsbishere
Copy link
Contributor Author

Goated, thanks guys <3

Keenuts added a commit to Keenuts/DirectXShaderCompiler that referenced this issue Jan 16, 2025
When loading a Float64 from a raw buffer, we used an Int64, which
required an additional capability, even if the code wasn't using any
Int64.
In practice, it seems most devices supporting Float64 do also support
Int64, but this it doesn't have to.
By changing the codegen a bit, we can avoid the Int64 value.

Fixes microsoft#7038

Signed-off-by: Nathan Gauër <[email protected]>
Keenuts added a commit to Keenuts/DirectXShaderCompiler that referenced this issue Jan 16, 2025
When loading a Float64 from a raw buffer, we used an Int64, which
required an additional capability, even if the code wasn't using any
Int64.
In practice, it seems most devices supporting Float64 do also support
Int64, but this it doesn't have to.
By changing the codegen a bit, we can avoid the Int64 value.

Tested the word-order using a vulkan compute shader, and checking the
returned value on the API side.
```hlsl
double tmp = buffer.Load<double>(0);
if (tmp == 12.0)
  buffer.Store<double>(0, 13.0);
```

Fixes microsoft#7038

Signed-off-by: Nathan Gauër <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug, regression, crash spirv Work related to SPIR-V
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants