Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to make it work on CPU #6

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Trying to make it work on CPU #6

wants to merge 5 commits into from

Conversation

tkf
Copy link
Member

@tkf tkf commented Feb 23, 2021

No description provided.

src/transduce.jl Outdated
ill = @index(Local, Linear)
igl = @index(Group, Linear)
igl = @uniform @index(Group, Linear)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
igl = @uniform @index(Group, Linear)
igl = @index(Group, Linear)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I might need to put it in @uniform since I'm using it after @synchronize?

if t == 1
@inbounds dest[igl] = shared[1]
end

I can re-fetch the index, there, though

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@index expression are always valid (and automatically inserted into the loop body).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without @uniform, I'm now getting UndefVarError: igl not defined: https://github.com/JuliaFolds/FoldsKernelAbstractions.jl/pull/6/checks?check_run_id=1958277481

Actually, I just remember putting it in @uniform since it was the dependency of other @uniform variables:

offsetb = (igl - 1) * groupsize()[1]
bound = max(0, nbasecases - offsetb)

src/transduce.jl Outdated
t = ill
s = 1
c = nextpow(2, groupsize()[1]) >> 1
@private m = ill - 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this form of @private is rather slow until KernelCompiler.jl lands. Using the older form is likely faster

@piever
Copy link

piever commented Mar 2, 2021

Not sure if this is helpful, but I've also played with block-level reduction with KernelAbstractions. I could get it to work on GPU (code similar to the one here), but not on CPU.

I tried replacing s and c with @private Int (1,) as well, but somehow using c[1] != 0 as a condition to break the while loop errors (KernelAbstractions still seems to think that s[1] is a vector rather than a scalar for some reason). Would love to know if a solution can be found here!

@tkf
Copy link
Member Author

tkf commented Mar 3, 2021

Yeah, I wonder if we need to tweak something in KernelAbstractions.jl or maybe just wait for KernelCompiler.jl

cc @vchuravy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants