-
Notifications
You must be signed in to change notification settings - Fork 685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyTorch-2.1.2-foss-2023a-CUDA-12.1.1.eb still fails with too many errors after #20156 #20222
Comments
I would not say that 3 is too many, we allow for 50 failures in the easyconfig because we know that there are many test that are unreliable. Are those 3 failed test very important? |
No clue, I'm more wondering why everybody elses test build passed with only 2 errors... And we shouldn't have an EC in a release that fails to build. So we either have to increase allowed failures or fix some of the above. |
I see, this is related to recent changes from #20156 |
Also from the confcall:
2 might indeed be to low and was mainly intended to shake out issues in that PR where it seemingly worked well enough. |
The extra V100/A40 problems was due to not having rebuilt one of the dependencies. |
@Flamefire Any ideas here?
I still get 3 errors on my builds.
A40:
WARNING: 3 test failures, 0 test errors (out of 211116):
test_jit 1/1 (1 failed, 2380 passed, 114 skipped, 12 xfailed, 2 rerun)
test_proxy_tensor 1/1 (1 failed, 2078 passed, 613 skipped, 80 xfailed, 2 rerun)
test_nn 1/1 (1 failed, 2798 passed, 128 skipped, 3 xfailed, 2 rerun)
V100:
WARNING: 3 test failures, 0 test errors (out of 210847):
inductor/test_compiled_autograd 1/1 (1 failed, 130 passed, 114 skipped, 2 rerun)
test_proxy_tensor 1/1 (1 failed, 2078 passed, 613 skipped, 80 xfailed, 2 rerun)
test_nn 1/1 (1 failed, 2556 passed, 109 skipped, 3 xfailed, 2 rerun)
A100:
WARNING: 3 test failures, 0 test errors (out of 211120):
test_optim 1/1 (2 failed, 182 passed, 2 skipped, 4 rerun)
test_nn 1/1 (1 failed, 2798 passed, 128 skipped, 3 xfailed, 2 rerun)
The text was updated successfully, but these errors were encountered: