-
Notifications
You must be signed in to change notification settings - Fork 4
[BUGFIX][0.5.0-UT] Skip CSR matmat and matvec float tests on ROCm <6.4 (NaN issue with beta==0) #380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: rocm-jaxlib-v0.5.0
Are you sure you want to change the base?
Conversation
tests/sparse_test.py
Outdated
self.skipTest("skipping int32 type tests") | ||
rocm_ver = get_rocm_version() | ||
if rocm_ver < (6, 4) and dtype in [np.float32, np.complex64]: | ||
self.skipTest("ROCm <6.4 bug: NaN propagation when beta==0 (fixed in ROCm 6.4)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in rocm 6.5?
tests/sparse_test.py
Outdated
self.skipTest("skipping int32 type tests") | ||
rocm_ver = get_rocm_version() | ||
if rocm_ver < (6, 4) and dtype in [np.float32, np.complex64]: | ||
self.skipTest("ROCm <6.4 bug: NaN propagation when beta==0 (fixed in ROCm 6.4)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 6.5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was actually fixed in 6.4.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it matter if it's 6.4.0 or 6.4? The tests are skipped for versions <6.4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checking against (6, 4) is sufficient i think
tests/sparse_test.py
Outdated
from pathlib import Path | ||
|
||
def get_rocm_version(): | ||
version_path = Path("/opt/rocm/.info/version") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will probably want to check ROCM_PATH from os.environ first, and then secondarily this path as the fallback.
tests/sparse_test.py
Outdated
|
||
def get_rocm_version(): | ||
version_path = Path("/opt/rocm/.info/version") | ||
assert version_path.exists(), ("Expected ROCm version file") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asserts get ignored in optimized python bytecode, so it would be better to make this an if not version_path.exists():
and throw an Exception
tests/sparse_test.py
Outdated
self.skipTest("skipping int32 type tests") | ||
rocm_ver = get_rocm_version() | ||
if rocm_ver < (6, 4) and dtype in [np.float32, np.complex64]: | ||
self.skipTest("ROCm <6.4 bug: NaN propagation when beta==0 (fixed in ROCm 6.4)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was actually fixed in 6.4.0?
Can we drop the [BUGFIX] at the beginning of the commit message? I think its pretty obvious that generally changes are fixing bugs. The rest of the message makes sense to me as well. |
efd56ee
to
ee8b6c5
Compare
… fix in ROCm 6.4.0
7632093
to
acc7cf7
Compare
Older versions of rocSPARSE (<6.4) did not zero out memory when
beta==0
, which could allow NaNs to propagate through