Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Occasional segfault during benchmarking #503

Open
maleadt opened this issue Dec 19, 2024 · 0 comments
Open

CI: Occasional segfault during benchmarking #503

maleadt opened this issue Dec 19, 2024 · 0 comments

Comments

@maleadt
Copy link
Member

maleadt commented Dec 19, 2024

As seen in https://buildkite.com/julialang/metal-dot-jl/builds/1537#0193ddfa-013b-49dd-8229-d4e3bf6151e5:

[ Info: Preparing main benchmarks
ERROR: Exception handler triggered on unmanaged thread.
[62318] signal 10 (1): Bus error: 10
in expression starting at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-macmini-aarch64-4.0/build/default-macmini-aarch64-4-0/julialang/metal-dot-jl/perf/runbenchmarks.jl:36
jl_gc_state_set at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XG3Q6T6R70.0/build/default-honeycrisp-XG3Q6T6R70-0/julialang/julia-release-1-dot-11/src/./julia_threads.h:334 [inlined]
jl_gc_state_save_and_set at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XG3Q6T6R70.0/build/default-honeycrisp-XG3Q6T6R70-0/julialang/julia-release-1-dot-11/src/./julia_threads.h:340 [inlined]
jl_delete_thread at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XG3Q6T6R70.0/build/default-honeycrisp-XG3Q6T6R70-0/julialang/julia-release-1-dot-11/src/threading.c:512
_pthread_tsd_cleanup at /usr/lib/system/libsystem_pthread.dylib (unknown line)
_pthread_exit at /usr/lib/system/libsystem_pthread.dylib (unknown line)
_pthread_wqthread_exit at /usr/lib/system/libsystem_pthread.dylib (unknown line)
_pthread_wqthread at /usr/lib/system/libsystem_pthread.dylib (unknown line)
Allocations: 197397221 (Pool: 197391790; Big: 5431); GC: 445
ERROR: Exception handler triggered on unmanaged thread.

This is surprisingly similar to what occasionally happens in CUDA.jl, e.g., https://buildkite.com/julialang/cuda-dot-jl/builds/5611#0193da04-b534-4be5-a488-960ed5c568e2:

[ Info: Preparing main benchmarks
🚨 Error: The command was interrupted by a signal: signal: trace/breakpoint trap (core dumped)

The source looks different though:

Thread 6 "julia" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 1179041]
ijl_process_events () at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/jl_uv.c:389
389	/cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/jl_uv.c: No such file or directory.
(gdb) bt
#0  ijl_process_events () at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/jl_uv.c:389
#1  0x00007ffff71d3f37 in ijl_task_get_next (trypoptask=<optimized out>, q=<optimized out>,
    checkempty=<optimized out>) at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/scheduler.c:610
#2  0x00007fffe25a9993 in julia_poptask_66344 () at task.jl:1012
#3  0x00007fffe386f313 in julia_wait_65850 () at task.jl:1021
#4  0x00007fffe30d4494 in julia_#wait#731_65869 () at condition.jl:130
#5  0x00007fff4cf399ae in ?? ()
#6  0x00007fff16d30010 in ?? ()
#7  0x389b91a0f883f464 in ?? ()
#8  0x00007fff31b83c28 in ?? ()
#9  0x00007fff16d30080 in ?? ()
#10 0x00007fff31b83cd0 in ?? ()
#11 0x00007fff31b83ad0 in ?? ()
#12 0x389b91a0f403f464 in ?? ()
#13 0x389b6b37b033f464 in ?? ()
#14 0x00007fff00000000 in ?? ()
#15 0x00007fff103c68c0 in ?? ()
#16 0x00007fff103afa70 in ?? ()
#17 0x0000000000000004 in ?? ()
#18 0x00007fff103c5c60 in ?? ()
#19 0x0000000000000008 in ?? ()
#20 0x0000000000000040 in ?? ()
#21 0x0000000000000000 in ?? ()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant