Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use KernelAbstractions backend #33

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Use KernelAbstractions backend #33

wants to merge 1 commit into from

Conversation

charleskawczynski
Copy link
Member

@charleskawczynski charleskawczynski commented Jul 19, 2024

This is a big step towards closing #32.

What else is needed:

  • Remove CUDA/Adapt extension/dependency
  • Add other backend tests?
  • Improve the performance
  • Fix Add CUDA tests #34

Right now, at least on the CPU, the KA kernels are more than 5x slower. cc @avik-pal @vchuravy

@avik-pal
Copy link

Thanks for looking into this so quickly. I generally tend not to use KA for CPU (in favor of just the @simd loop), but I think it uses threading for CPU, which might not be ideal for smaller broadcasting problems.

@charleskawczynski
Copy link
Member Author

Thanks for looking into this so quickly. I generally tend not to use KA for CPU (in favor of just the @simd loop), but I think it uses threading for CPU, which might not be ideal for smaller broadcasting problems.

Yeah, I thought I'd give it a try, it's kind of a nice solution, to be honest. The only reason CUDA is still needed in the extension is for Adapt. I think that this branch should mostly support other backends, but if Adapt is needed for other backends, then I think you'd need to add it. I don't have experience adapting AMD gpus, but I imagine the pattern is very similar.

Feel free to try things on this branch. I probably won't merge this in until we add tests (to close #34), and perhaps fix the cpu perf, or just revert things to how we previously had it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add CUDA tests
2 participants