Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPIR-V Instruction reordering resulting in false-positive read-write data race? #210

Open
richardschulze opened this issue Aug 21, 2024 · 3 comments

Comments

@richardschulze
Copy link

richardschulze commented Aug 21, 2024

First of all, many thanks for your work on Oclgrind! It has been an invaluable tool for developing OpenCL and also CUDA kernels.

I am facing weird behavior in the following (simplified) kernel, that simply copies some values around:

__kernel void test(__global float * const restrict buf_4, __global float const * const restrict buf_0, __global float * const restrict buf_1) {
  float buf_3[1];
  __local float buf_2[1];
  for (int i_1 = 0; i_1 < 1; ++i_1) {
    for (int i_2 = 0; i_2 < 2; ++i_2) {
      if (get_local_id(0) == 0) {
        buf_1[0] = buf_0[get_global_id(0)];
        buf_2[0] = buf_1[0];
      }
    } // i_2
    if (get_local_id(0) == 0) {
      buf_3[0] = buf_2[0];
    }
    if (get_local_id(0) == 0) {
      buf_4[0] = buf_3[0];
    }
  } // i_1
}

executed with oclgrind-kernel using this simfile:

kernel.cl
test
5 1 1
5 1 1

<size=4 noinit>
<size=20 range=0:1:4>
<size=4 noinit>

I get multiple read-write data races:

Read-write data race at local memory address 0x1000000000000
        Kernel: test
        
        First entity:  Global(1,0,0) Local(1,0,0) Group(0,0,0)
          %0 = load float, float addrspace(3)* @test.buf_2.0, align 4
        Debugging information not available.
        
        
        Second entity: Global(0,0,0) Local(0,0,0) Group(0,0,0)
          store float %1, float addrspace(3)* @test.buf_2.0, align 4, !dbg !69, !tbaa !62
        At line 8 (column 18) of input.cl:
          buf_2[0] = buf_1[0];

However, just by looking at the kernel, I don't think there should be any data races, since all array accesses are wrapped in an if on the local id. Also, changing one of the following things makes the data-race warnings disappear:

  • removing the i_1 loop (even though it does a single iteration only and is not even used in the kernel)
  • combining the last two ifs
  • adding a barrier(CLK_LOCAL_MEM_FENCE); after the first if
    This is weird, since those changes are syntactical only, and should not change the semantics of the program.

Any help would be greatly appreciated. Many thanks in advance!

@richardschulze
Copy link
Author

I was able to simplify the kernel a bit more, and I think I could narrow down the problem a bit further.
This kernel:

__kernel void test(__global float * const restrict buf_4, __global float const * const restrict buf_0) {
  float buf_3[1];
  __local float buf_2[2];
  for (int i_1 = 0; i_1 < 1; ++i_1) {
    buf_2[get_global_id(0)] = buf_0[get_global_id(0)];
    if (get_global_id(0) == 0) {
      buf_3[0] = buf_2[0];
    }
    if (get_global_id(0) == 0) {
      buf_4[0] = buf_3[0];
    }
  } // i_1
}

executed using oclgrind-kernel with this simfile:

kernel.cl
test
2 1 1
2 1 1

<size=4 noinit>
<size=20 range=0:1:4>

produces the following SPIR-V code:

; ModuleID = 'input.cl'
source_filename = "input.cl"
target datalayout = "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024"
target triple = "spir64-unknown-unknown"

@test.buf_2 = internal unnamed_addr addrspace(3) global [2 x float] undef, align 4, !dbg !0

; Function Attrs: convergent minsize nofree norecurse nounwind optsize
define dso_local spir_kernel void @test(float addrspace(1)* noalias nocapture %buf_4, float addrspace(1)* noalias nocapture readonly %buf_0) local_unnamed_addr #0 !dbg !2 !kernel_arg_addr_space !36 !kernel_arg_access_qual !37 !kernel_arg_type !38 !kernel_arg_base_type !38 !kernel_arg_type_qual !39 !kernel_arg_name !40 {
entry:
  br label %for.cond, !dbg !41

for.cond:                                         ; preds = %for.cond.backedge, %entry
  %buf_3.sroa.0.0 = phi float [ undef, %entry ], [ %spec.select, %for.cond.backedge ]
  %cmp = phi i1 [ true, %entry ], [ false, %for.cond.backedge ], !dbg !42
  br i1 %cmp, label %for.body, label %for.cond.cleanup, !dbg !43

for.cond.cleanup:                                 ; preds = %for.cond
  ret void, !dbg !44

for.body:                                         ; preds = %for.cond
  %call = tail call spir_func i64 @_Z13get_global_idj(i32 0) #3, !dbg !45
  %arrayidx = getelementptr inbounds float, float addrspace(1)* %buf_0, i64 %call, !dbg !48
  %0 = load float, float addrspace(1)* %arrayidx, align 4, !dbg !48, !tbaa !49
  %arrayidx2 = getelementptr inbounds [2 x float], [2 x float] addrspace(3)* @test.buf_2, i64 0, i64 %call, !dbg !53
  store float %0, float addrspace(3)* %arrayidx2, align 4, !dbg !54, !tbaa !49
  %cmp4 = icmp eq i64 %call, 0, !dbg !55
  %1 = load float, float addrspace(3)* getelementptr inbounds ([2 x float], [2 x float] addrspace(3)* @test.buf_2, i64 0, i64 0), align 4
  %spec.select = select i1 %cmp4, float %1, float %buf_3.sroa.0.0, !dbg !57
  br i1 %cmp4, label %if.then8, label %for.cond.backedge, !dbg !58

for.cond.backedge:                                ; preds = %for.body, %if.then8
  br label %for.cond, !dbg !42, !llvm.loop !59

if.then8:                                         ; preds = %for.body
  store float %spec.select, float addrspace(1)* %buf_4, align 4, !dbg !61, !tbaa !49
  br label %for.cond.backedge, !dbg !64
}
; ... shortened for readability, but I'm happy to provide the full code if necessary

The data-race is caused by lines 5 and 7 in the OpenCL kernel and lines 26 and 28 in the SPIR-V kernel.
I'm not really familiar with SPIR-V, but as far as I can tell, technically, Oclgrind is correct that the load operation in line 28 of the SPIR-V kernel does indeed produce a data-race. However, I don't think Oclgrind should report this, because the loaded value is only used (indirectly) in line 36, a block where the if condition evaluates to true.

Do you see any way to adapt Oclgrind to honor cases like this?

@richardschulze richardschulze changed the title Undefined behaviour resulting in false-positive read-write data race? SPIR-V Instruction reordering resulting in false-positive read-write data race? Aug 31, 2024
@jrprice
Copy link
Owner

jrprice commented Oct 9, 2024

Thanks for reporting this. It certainly seems like LLVM is transforming a program which has no data races into one that does, which is unfortunate.

Tracking loads to see if they are used in any significant way would likely be a large amount of effort to add to the data race detection plugin.

We might be able to figure out which LLVM pass is causing this to happen and disable it. In the meantime, you can work around this by changing the optimization level. For the example you provided, using-Os or -O3 both removed the data race report (e.g. --build-options -Os).

@richardschulze
Copy link
Author

Many thanks for the suggestion using -Os or -O3, I did not think of that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants