Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault and core dump on Summit #31

Open
williamfgc opened this issue Nov 18, 2020 · 6 comments
Open

Segfault and core dump on Summit #31

williamfgc opened this issue Nov 18, 2020 · 6 comments

Comments

@williamfgc
Copy link

I need some guidance on how to run MACSio on Summit.
To reproduce: I was able to build successfully the MACSio binary with the following dependencies:

ldd ~/opt/macsio/macsio 
	linux-vdso64.so.1 =>  (0x00007fffb6120000)
	libjson-cwx.so.2 => /ccs/home/wgodoy/opt/json-cwx/lib/libjson-cwx.so.2 (0x00007fffb60e0000)
	libmpiprofilesupport.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/lib/libmpiprofilesupport.so.3 (0x00007fffb60b0000)
	libmpi_ibm.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/lib/libmpi_ibm.so.3 (0x00007fffb5f30000)
	libstdc++.so.6 => /sw/summit/gcc/6.4.0/lib64/libstdc++.so.6 (0x00007fffb5d20000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fffb5c10000)
	libgcc_s.so.1 => /sw/summit/gcc/6.4.0/lib64/libgcc_s.so.1 (0x00007fffb5bd0000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fffb59e0000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fffb59b0000)
	libutil.so.1 => /lib64/libutil.so.1 (0x00007fffb5980000)
	libhwloc_ompi.so.15 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/lib/libhwloc_ompi.so.15 (0x00007fffb5910000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fffb58e0000)
	libevent-2.1.so.6 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/lib/libevent-2.1.so.6 (0x00007fffb5860000)
	libevent_pthreads-2.1.so.6 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/lib/libevent_pthreads-2.1.so.6 (0x00007fffb5830000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fffb57f0000)
	libopen-rte.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/lib/libopen-rte.so.3 (0x00007fffb56e0000)
	libopen-pal.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/lib/libopen-pal.so.3 (0x00007fffb55f0000)
	/lib64/ld64.so.2 (0x00007fffb6140000)

Unfortunately, any combination of input parameters result in a core dump being emitted and a seg fault.

jsrun -n 2 ${MACSIO_EXEC} --interface hdf5 --parallel_file_mode MIF 2 --part_size 1M

Results:

cat output.490227 
[b28n03:147697] *** Process received signal ***
[b28n03:147697] Signal: Segmentation fault (11)
[b28n03:147697] Signal code: Address not mapped (1)
[b28n03:147697] Failing at address: 0x40
[b28n03:147697] [ 0] [0x2000000504d8]
[b28n03:147697] [ 1] [0x20000004d6b0]
[b28n03:147697] [ 2] /ccs/home/wgodoy/opt/json-cwx/lib/libjson-cwx.so.2(json_object_set_string+0x58)[0x2000000f9838]
[b28n03:147697] [ 3] /ccs/home/wgodoy/opt/json-cwx/lib/libjson-cwx.so.2(json_object_path_set_string+0x34)[0x2000000fa764]
[b28n03:147697] [ 4] /ccs/home/wgodoy/opt/macsio/macsio[0x10018cc0]
[b28n03:147697] [ 5] /ccs/home/wgodoy/opt/macsio/macsio(main+0x900)[0x10005980]
[b28n03:147697] [ 6] /lib64/libc.so.6(+0x25200)[0x200000655200]
[b28n03:147697] [ 7] /lib64/libc.so.6(__libc_start_main+0xc4)[0x2000006553f4]
[b28n03:147697] *** End of error message ***
[b28n03:147696] *** Process received signal ***
[b28n03:147696] Signal: Segmentation fault (11)
[b28n03:147696] Signal code: Address not mapped (1)
[b28n03:147696] Failing at address: 0x40
[b28n03:147696] [ 0] [0x2000000504d8]
[b28n03:147696] [ 1] [0x20000004d6b0]
[b28n03:147696] [ 2] /ccs/home/wgodoy/opt/json-cwx/lib/libjson-cwx.so.2(json_object_set_string+0x58)[0x2000000f9838]
[b28n03:147696] [ 3] /ccs/home/wgodoy/opt/json-cwx/lib/libjson-cwx.so.2(json_object_path_set_string+0x34)[0x2000000fa764]
[b28n03:147696] [ 4] /ccs/home/wgodoy/opt/macsio/macsio[0x10018cc0]
[b28n03:147696] [ 5] /ccs/home/wgodoy/opt/macsio/macsio(main+0x900)[0x10005980]
[b28n03:147696] [ 6] /lib64/libc.so.6(+0x25200)[0x200000655200]
[b28n03:147696] [ 7] /lib64/libc.so.6(__libc_start_main+0xc4)[0x2000006553f4]
[b28n03:147696] *** End of error message ***
ERROR:  One or more process (first noticed rank 0) terminated with signal 11 (core dumped)

macsio-log.log:

--------------------------------------------------------Processor 000000-------------------------------------------------------

Any help would be appreciated!

@markcmiller86
Copy link
Member

Which version of MACSio are you running? And, can you possible attach your CMakeCache.txt file here?

@williamfgc
Copy link
Author

@markcmiller86 thanks for the quick response. I'm building the current master branch using gcc 6.4.0. Please find attached the
CMakeCache.txt file.

@markcmiller86
Copy link
Member

Do you think I should be able to duplicate behavior on LLNL's own Lassen system

@williamfgc
Copy link
Author

@markcmiller86 that's a good idea to make sure it's not a Summit problem, I'll try to build locally as well.

@williamfgc
Copy link
Author

williamfgc commented Dec 3, 2020

@markcmiller86 just following up on this after the break. I built version 1.1 on Summit as it doesn't have the seg fault. Hope it helps.

@markcmiller86
Copy link
Member

Sorry for delay. That does help. I just don't have confidence to replicate the issue on LLNL's Lassen system and am, at the moment, up to my ears in other tasks. Please ping me again in a week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants