Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorder kernels in CUDA backend fail with SZ=64 #51

Open
semi-h opened this issue Feb 22, 2024 · 0 comments
Open

Reorder kernels in CUDA backend fail with SZ=64 #51

semi-h opened this issue Feb 22, 2024 · 0 comments
Labels
bug Something isn't working cuda Related to CUDA backend performance priority low Low priority issue

Comments

@semi-h
Copy link
Member

semi-h commented Feb 22, 2024

This is due to us going above the maxiumum thread count per block when we set SZ=64. Some of the reorder kernels require a 2D thread with SZ*SZ threads. We need to fix these kernels so that when SZ=64 or above these kernels work on tiles of 32 by 32.

@semi-h semi-h added bug Something isn't working cuda Related to CUDA backend labels Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuda Related to CUDA backend performance priority low Low priority issue
Projects
None yet
Development

No branches or pull requests

2 participants