Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault after #3821 #3856

Open
wujingyue opened this issue Feb 8, 2025 · 4 comments
Open

Segfault after #3821 #3856

wujingyue opened this issue Feb 8, 2025 · 4 comments
Assignees

Comments

@wujingyue
Copy link
Collaborator

wujingyue commented Feb 8, 2025

I merged #3821 too quickly. The CI indeed showed the same error.

To reproduce this,

$ _bn && DEBUG_SERDE=debug pytest tests/python/test_python_frontend.py -s
Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x2a420588)
==== backtrace (tid:1495132) ====
 0  /usr/local/ucx/lib/libucs.so.0(ucs_handle_error+0x2e4) [0x79a4e79ae614]
 1  /usr/local/ucx/lib/libucs.so.0(+0x3680c) [0x79a4e79ae80c]
 2  /usr/local/ucx/lib/libucs.so.0(+0x36a48) [0x79a4e79aea48]
 3  [0x2a420588]
=================================
Fatal Python error: Segmentation fault

Current thread 0x000079a4e9ab5300 (most recent call first):
  File "/opt/pytorch/nvfuser/nvfuser/__init__.py", line 73 in segment
  File "/opt/pytorch/nvfuser/tests/python/utils.py", line 268 in check_cpp_translation
  File "/opt/pytorch/nvfuser/tests/python/utils.py", line 477 in exec_nvfuser
  File "/opt/pytorch/nvfuser/tests/python/utils.py", line 410 in inner_fn
  File "/opt/pytorch/nvfuser/tests/python/test_python_frontend.py", line 2956 in test_issue1273
  File "/usr/local/lib/python3.12/dist-packages/torch/testing/_internal/common_utils.py", line 3099 in wrapper
  File "/usr/lib/python3.12/unittest/case.py", line 589 in _callTestMethod
  File "/usr/lib/python3.12/unittest/case.py", line 634 in run
  File "/usr/local/lib/python3.12/dist-packages/torch/testing/_internal/common_utils.py", line 3206 in _run_custom
  File "/usr/local/lib/python3.12/dist-packages/torch/testing/_internal/common_utils.py", line 3234 in run
  File "/usr/lib/python3.12/unittest/case.py", line 690 in __call__
  File "/usr/local/lib/python3.12/dist-packages/_pytest/unittest.py", line 321 in runtest
  File "/usr/local/lib/python3.12/dist-packages/_pytest/runner.py", line 172 in pytest_runtest_call
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_callers.py", line 103 in _multicall
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_hooks.py", line 513 in __call__
  File "/usr/local/lib/python3.12/dist-packages/_pytest/runner.py", line 240 in <lambda>
  File "/usr/local/lib/python3.12/dist-packages/_pytest/runner.py", line 340 in from_call
  File "/usr/local/lib/python3.12/dist-packages/_pytest/runner.py", line 239 in call_and_report
  File "/usr/local/lib/python3.12/dist-packages/_pytest/runner.py", line 134 in runtestprotocol
  File "/usr/local/lib/python3.12/dist-packages/_pytest/runner.py", line 115 in pytest_runtest_protocol
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_callers.py", line 103 in _multicall
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_hooks.py", line 513 in __call__
  File "/usr/local/lib/python3.12/dist-packages/_pytest/main.py", line 364 in pytest_runtestloop
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_callers.py", line 103 in _multicall
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_hooks.py", line 513 in __call__
  File "/usr/local/lib/python3.12/dist-packages/_pytest/main.py", line 339 in _main
  File "/usr/local/lib/python3.12/dist-packages/_pytest/main.py", line 285 in wrap_session
  File "/usr/local/lib/python3.12/dist-packages/_pytest/main.py", line 332 in pytest_cmdline_main
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_callers.py", line 103 in _multicall
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_hooks.py", line 513 in __call__
  File "/usr/local/lib/python3.12/dist-packages/_pytest/config/__init__.py", line 174 in main
  File "/usr/local/lib/python3.12/dist-packages/_pytest/config/__init__.py", line 197 in console_main
  File "/usr/local/bin/pytest", line 8 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, jaxlib.cpu_feature_guard, psutil._psutil_linux, psutil._psutil_posix (total: 27)
[1]    1495132 segmentation fault (core dumped)  DEBUG_SERDE=debug pytest tests/python/test_python_frontend.py -s
@wujingyue wujingyue changed the title Segfault after https://github.com/NVIDIA/Fuser/pull/3821 Segfault after #3821 Feb 8, 2025
@cowanmeg
Copy link
Collaborator

Will look into this!

@wujingyue
Copy link
Collaborator Author

Thank you!

@cowanmeg
Copy link
Collaborator

Hmm... this bug is a bit strange since the segfault occurs in ucx. Just confirming these python_frontend tests only test single device behavior, right?

@wujingyue
Copy link
Collaborator Author

these python_frontend tests only test single device behavior, right?

That's right!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants