Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV when tracing communications #1

Open
648trindade opened this issue Jan 17, 2019 · 1 comment
Open

SIGSEGV when tracing communications #1

648trindade opened this issue Jan 17, 2019 · 1 comment

Comments

@648trindade
Copy link

648trindade commented Jan 17, 2019

Computer: NUMA 8 nodes with 1 Intel Xeon E5-4617 each (48 threads in total), 488GB DRAM
OS: Debian Stretch
Software: GCC 7.3, Intel Pin 3.7, Ondes3D 1.0 (OpenMP version)
Also tested with GCC 6, Intel Pin 3.2 and 3.6, LBM 2D, LULESH and CoMD applications

By running the tool for trace communications (-c) from Ondes3D application (cell grid size 500x500x500), the application is killed with the following message:

C: Tool (or Pin) caused signal 11 at PC <address>
Segmentation Fault

By attaching gdb I can get the following stack trace when the error occours:

#0  0x00007f6516832ee4 in std::hashtable<std::pair<unsigned long const, TIDlist>, unsigned long, std::hash<unsigned long>, std::priv::_UnorderedMapTraitsT<std::pair<unsigned long const, TIDlist> >, std::priv::_Select1st<std::pair<unsigned long const, TIDlist> >, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, TIDlist> > >::_S_before_begin (__n=<synthetic pointer>: 0x818ab84, __buckets=..., __elems=...) at /opt/pin-3.7/extras/stlport/include/stl/_hashtable.c:169
#1  std::hashtable<std::pair<unsigned long const, TIDlist>, unsigned long, std::hash<unsigned long>, std::priv::_UnorderedMapTraitsT<std::pair<unsigned long const, TIDlist> >, std::priv::_Select1st<std::pair<unsigned long const, TIDlist> >, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, TIDlist> > >::_M_before_begin (__n=<synthetic pointer>: 0x818ab84, this=0x7f6516c20360 <commmap>) at /opt/pin-3.7/extras/stlport/include/stl/_hashtable.c:149
#2  std::hashtable<std::pair<unsigned long const, TIDlist>, unsigned long, std::hash<unsigned long>, std::priv::_UnorderedMapTraitsT<std::pair<unsigned long const, TIDlist> >, std::priv::_Select1st<std::pair<unsigned long const, TIDlist> >, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, TIDlist> > >::_M_insert_noresize (__obj=..., __n=0x818ab84, this=0x7f6516c20360 <commmap>) at /opt/pin-3.7/extras/stlport/include/stl/_hashtable.c:187
#3  std::hashtable<std::pair<unsigned long const, TIDlist>, unsigned long, std::hash<unsigned long>, std::priv::_UnorderedMapTraitsT<std::pair<unsigned long const, TIDlist> >, std::priv::_Select1st<std::pair<unsigned long const, TIDlist> >, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, TIDlist> > >::insert_unique_noresize (__obj=..., this=0x7f6516c20360 <commmap>) at /opt/pin-3.7/extras/stlport/include/stl/_hashtable.c:223
#4  std::hashtable<std::pair<unsigned long const, TIDlist>, unsigned long, std::hash<unsigned long>, std::priv::_UnorderedMapTraitsT<std::pair<unsigned long const, TIDlist> >, std::priv::_Select1st<std::pair<unsigned long const, TIDlist> >, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, TIDlist> > >::_M_insert (this=this@entry=0x7f6516c20360 <commmap>, __obj=...) at /opt/pin-3.7/extras/stlport/include/stl/_hashtable.c:256
#5  0x00007f651682aa43 in std::tr1::unordered_map<unsigned long, TIDlist, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, TIDlist> > >::operator[] (this=0x7f6516c20360 <commmap>, __key=@0x7f641e28d518: 0x1fd901bd246) at /opt/pin-3.7/extras/stlport/include/stl/_unordered_map.h:156
#6  do_comm (addr=<optimized out>, tid=<optimized out>) at numalize.cpp:101
#7  0x00007f6504629ed3 in ?? ()
... (just addresses)
#18 0x0000000000000000 in ?? ()

The source line from numalize is always the same (line 101):

THREADID a = commmap[line].first;

Always happens when the tool is inserting a new entry to the map (when there is no mapped value for key line on the commmap). The size of commmap at time of the crash has a few variations: 100663361 to 100663365 elements in some executions I made with Ondes3D. The tool works fine if the application uses smaller grid parameters

I believe that the error is in the STL port from Intel PIN, the tool code looks ok. Changing commmap from unordered_map to map only changes the content of stack trace: the tool keeps crashing.

@matthiasdiener
Copy link
Owner

matthiasdiener commented Jan 17, 2019

I believe the issue is that commmap is modified concurrently by multiple threads, which isn't officially supported by the C++ standard, but seems to work when preallocating the slots.
Can you try increasing the preallocation in numalize.cpp, line 611, to something like 1000*1000*1000? (You are currently seeing a crash just above the current preallocation of 100*1000*1000).

@matthiasdiener matthiasdiener pinned this issue Jan 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants