Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible unnecessary __syncthreads using for kernel function FindMaxCorr in matching.cu #38

Open
Lebronmydx opened this issue Aug 20, 2018 · 1 comment

Comments

@Lebronmydx
Copy link

Hello there!

Currently I am learning CudaSift source code and may find some unnecessary use of __syncthreads() for some kernel funcitons in matching.cu.

For kernel function FindMaxCorr,

__global__ void FindMaxCorr(float *corrData, SiftPoint *sift1, SiftPoint *sift2, int numPts1, int corrWidth, int siftSize)
{
  .........
  if (tx==6)
    sift1[p1].score = maxScore[ty*16];
  if (tx==7)
    sift1[p1].ambiguity = maxScor2[ty*16] / (maxScore[ty*16] + 1e-6);
  if (tx==8)
    sift1[p1].match = maxIndex[ty*16];
  if (tx==9)
    sift1[p1].match_xpos = sift2[maxIndex[ty*16]].xpos;
  if (tx==10)
    sift1[p1].match_ypos = sift2[maxIndex[ty*16]].ypos;
  __syncthreads();
}

In line 160, before kernel function finished, FindMaxCorr calls __syncthreads(), but what confuses me is that line 160 is the last code kernel function executing, there should be unnecessary to synchronize threads here?

Same issues comes for FindMaxCorr1, FindMaxCorr2, FindMaxCorr3.

Thanks very much! :)

@Celebrandil
Copy link
Owner

Yes, there is a bit of cleaning up to do there. Sometimes when I detect oddities in the output, I add an unnecessary synchronization just in case. In fact those things should be all run on the same thread, since it cannot be parallelized anyway. Thank you for pointing it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants