-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-tuning workgroupsize when localmem consumption depends on it #215
Comments
This is #11 KA doesn't support dynamic shared memory. |
Does #11 have auto-tuning? I skimmed the code but I couldn't find any. Or it's planned but not implemented? |
No #11 was started before we added auto-tuning, and stalled since no-one had a clear need for it. |
oh, that sounds like I need to give a shot at it if I want it 😂 I still am not clear how to implement auto-tuning with #11, though. If I write If these concerns are legit, maybe we still need the explicit |
I'm in particular interested in the use case combined with pre-launch workgroupsize auto-tuning #216. |
is auto tuning documented? if so, i can't find it. |
When the |
Does KernelAbstractions.jl support auto-setting workgroupsize when the kernel has local memory size that depends on groupsize? For example,
CUDA.launch_configuration
takes ashmem
callback that maps a number of threads to shared memory used. This is used for implementingmapreduce
in CUDA.jl. Sinceshmem
argument forCUDA.launch_configuration
is not used inKernel{CUDADevice}
, I guess it's not implemented yet? Is it related to #19?The text was updated successfully, but these errors were encountered: