-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NVIDIA] Add support for tensor conversion from fp16 to fp32 using ExtFOp #3874
Conversation
I'm not sure if we are interested in supporting Pascal
|
In principle I am not opposed to fixing up pascal issues if the added complexity is minimal, but here could you elaborate a little bit more on why the ampere code path is failing? |
third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/ElementwiseOpToLLVM.cpp
Outdated
Show resolved
Hide resolved
First, Pascal GPUs have very poor performance in fp16, and there is no bf16. (I think maybe it's important). The crash happens in As for the fpext that @ThomasRaoux mentioned, this can be related: Codeimport torch
import triton
import triton.language as tl
@triton.jit
def test_dot_kernel():
t1 = tl.zeros([16, 16], dtype=tl.float16)
t2 = tl.zeros([16, 16], dtype=tl.float16)
d = tl.dot(t1, t2)
tl.device_print("dot:", d)
grid = lambda meta: (1, )
kernel = test_dot_kernel[grid]() Generated IR (?)
Stacktrace
|
13dea63
to
5d76f19
Compare
Unfortunately, this is precisely the kind of workarounds we meant to avoid when we dropped support for pre-A100 GPUs. On all GPUs supported by Triton, |
This PR fixes
Unsupported conversion from f16 to f16
LLVM ERROR: Unsupported rounding mode for conversion.
on Pascal GPUs.