Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV of a2ps when compiled against bdw-gc on Solaris #664

Open
l1gi opened this issue Sep 23, 2024 · 17 comments
Open

SIGSEGV of a2ps when compiled against bdw-gc on Solaris #664

l1gi opened this issue Sep 23, 2024 · 17 comments

Comments

@l1gi
Copy link

l1gi commented Sep 23, 2024

Hello,

I am trying to upgrade to a recent version of a2ps where bdw-gc is a mandatory dependency. I have successfully built and tested bdw-gc on recent Solaris:

$ gmake check
...
/bin/bash ./libtool  --tag=CC   --mode=link /usr/gcc/13/bin/gcc   -fexceptions -DGC_VISIBILITY_HIDDEN_SET -fvi
sibility=hidden -Wall -Wextra -Wpedantic -Wno-long-long -m64 -fPIC -DPIC -O3 -ffile-prefix-map=/scratch/userla
nd-gate/components/bdw-gc=. -fno-strict-aliasing -Wno-frame-address    -o disclaim_weakmap_test tests/disclaim_weakmap_test.o  ./libgc.la  -lpthread -lrt -ldl 
libtool: link: /usr/gcc/13/bin/gcc -fexceptions -DGC_VISIBILITY_HIDDEN_SET -fvisibility=hidden -Wall -Wextra -
Wpedantic -Wno-long-long -m64 -fPIC -DPIC -O3 -ffile-prefix-map=/scratch/userland-gate/components/bdw-gc=. -fn
o-strict-aliasing -Wno-frame-address -o .libs/disclaim_weakmap_test tests/disclaim_weakmap_test.o  ./.libs/lib
gc.so -lpthread -lrt -ldl -R/usr/lib/amd64          
make[3]: 'libstaticrootslib_test.la' is up to date.  
make[3]: 'libstaticrootslib2_test.la' is up to date.
make[3]: Leaving directory '/scratch/userland-gate/components/bdw-gc/build/amd64'
/usr/gnu/bin/make  check-TESTS                                                                                
make[3]: Entering directory '/scratch/userland-gate/components/bdw-gc/build/amd64'
make[4]: Entering directory '/scratch/userland-gate/components/bdw-gc/build/amd64'
PASS: cordtest                                         
PASS: gctest                                           
PASS: leaktest                                         
PASS: middletest                                                                                              
PASS: smashtest                                        
PASS: hugetest                                                                                                
PASS: realloc_test                                                                                            
PASS: staticrootstest                                                                                         
PASS: test_atomic_ops                                                                                         
PASS: threadleaktest                                                                                          
PASS: threadkey_test                                                                                          
PASS: subthreadcreate_test                                                                                    
PASS: initsecondarythread_test                        
PASS: disclaim_test                                    
PASS: disclaim_bench                                   
PASS: disclaim_weakmap_test                                                                                   
============================================================================   
Testsuite summary for gc 8.2.8
============================================================================                                  
# TOTAL: 16                                                                                                   
# PASS:  16                                            
# SKIP:  0                                                                                                    
# XFAIL: 0                                                                                                    
# FAIL:  0                                             
# XPASS: 0                                                                                                    
# ERROR: 0                                                                                                    
============================================================================
make[4]: Leaving directory '/scratch/userland-gate/components/bdw-gc/build/amd64'

a2ps configure script has found libgc.so.1 and the whole package has built without issues. I am getting following SIGSEGV after trying to run the binary built:

Reading symbols from build/prototype/i386/usr/bin/a2ps...
(gdb) r
Starting program: /scratch/userland-gate/components/a2ps/build/prototype/i386/usr/bin/a2ps 
[Thread debugging using libthread_db enabled]
warning: could not convert 'mutex_t' from the host encoding (ISO-8859-1) to UTF-32.
This normally should not happen, please file a bug report.
[New Thread 1 (LWP 1)]

Thread 2 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1 (LWP 1)]
0x00007fdec8425491 in GC_SysVGetDataStart () from /usr/lib/64/libgc.so.1
(gdb) bt
#0  0x00007fdec8425491 in GC_SysVGetDataStart () from /usr/lib/64/libgc.so.1
#1  0x00007fdec84258d3 in GC_init () from /usr/lib/64/libgc.so.1
#2  0x00007fdec8426a32 in GC_generic_malloc_inner () from /usr/lib/64/libgc.so.1
#3  0x00007fdec8427ddf in GC_generic_malloc () from /usr/lib/64/libgc.so.1
#4  0x00007fdec84281fd in GC_malloc_kind_global () from /usr/lib/64/libgc.so.1
#5  0x00007fdec8e67a2a in rpl_malloc (n=5) at ./build/amd64/lib/malloc.c:43
#6  0x00007fdec8e63dfb in imalloc (s=5) at ./build/amd64/lib/ialloc.h:51
#7  0x00007fdec8e67432 in ximalloc (s=5) at ./build/amd64/lib/xmalloc.c:51
#8  0x00007fdec8e638df in base_name (
    name=0x7fed2e720cf0 "/scratch/userland-gate/components/a2ps/build/prototype/i386/usr/bin/a2ps")
    at ./build/amd64/lib/basename.c:53
#9  0x00007fdec8e30d01 in main (argc=1, argv=0x7fed2e720a98) at ./build/amd64/src/main.c:922
(gdb) 

Could you help me with debugging the issue, please? Should you need more information, dont't hesitate to ask.

Thank you,
m.

@l1gi
Copy link
Author

l1gi commented Sep 23, 2024

This is how a2ps binaries are build:

/bin/bash ../libtool  --tag=CC   --mode=link /usr/gcc/13/bin/gcc   -m64 -fPIC -DPIC -O3 -ffile-prefix-map=/scr
atch/userland-gate/components/a2ps=. -g -O0   -o a2ps main.o read.o sshread.o ssheet.o select.o generate.o del
egate.o buffer.o versions.o ffaces.o ../liba2ps/liba2ps.la ../liba2ps/libnowarnings.a libparse.a  ../lib/libgn
u.la -lm -lgc -lpthread -lrt -ldl -lpaper  -lpaper
libtool: link: /usr/gcc/13/bin/gcc -m64 -fPIC -DPIC -O3 -ffile-prefix-map=/scratch/userland-gate/components/a2
ps=. -g -O0 -o a2ps main.o read.o sshread.o ssheet.o select.o generate.o delegate.o buffer.o versions.o ffaces
.o  ../liba2ps/.libs/liba2ps.a ../liba2ps/libnowarnings.a libparse.a ../lib/.libs/libgnu.a -lm -lgc -lpthread 
-lrt -ldl -lpaper

It corresponds to what pkgconfig contains:

prefix=/usr
exec_prefix=${prefix}
libdir=/usr/lib/amd64
includedir=${prefix}/include

Name: Boehm-Demers-Weiser Conservative Garbage Collector
Description: A garbage collector for C and C++
Version: 8.2.8
Libs: -L${libdir}  -lgc -lpthread -lrt -ldl
Cflags: -I${includedir}

Thanks,
m.

@ivmai
Copy link
Owner

ivmai commented Sep 23, 2024

-O3

Please recompile libgc with -O0 -g to print exact source code line and args.

@ivmai
Copy link
Owner

ivmai commented Sep 23, 2024

It seems that GC_setup_temporary_fault_handler does not work properly when linked with a2ps.
The issue might be caused (my guess) by incorrect order of -l options.

But, please provide the stack trace with line numbers.

@l1gi
Copy link
Author

l1gi commented Sep 23, 2024

Will do and let you know. Thank you!

@l1gi
Copy link
Author

l1gi commented Sep 25, 2024

This is the backtrace I got:

$ gdb /usr/bin/a2ps
GNU gdb (GDB) 13.1
...
Reading symbols from /usr/bin/a2ps...
(gdb) r
Starting program: /usr/bin/a2ps 
[Thread debugging using libthread_db enabled]
warning: could not convert 'mutex_t' from the host encoding (ISO-8859-1) to UTF-32.
This normally should not happen, please file a bug report.
[New Thread 1 (LWP 1)]

Thread 2 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1 (LWP 1)]
0x00007fdbe562c2ce in GC_SysVGetDataStart (max_page_size=4096, etext_addr=0x7fdbe5f87f11 "")
    at ./gc-8.2.8/extra/../os_dep.c:2018
2018    ./gc-8.2.8/extra/../os_dep.c: No such file or directory.
(gdb) bt
#0  0x00007fdbe562c2ce in GC_SysVGetDataStart (max_page_size=4096, etext_addr=0x7fdbe5f87f11 "")
    at ./gc-8.2.8/extra/../os_dep.c:2018
#1  0x00007fdbe562c314 in GC_register_data_segments () at ./gc-8.2.8/extra/../os_dep.c:2134
#2  0x00007fdbe562a79a in GC_init () at ./gc-8.2.8/extra/../misc.c:1300
#3  0x00007fdbe5621688 in GC_generic_malloc_inner (lb=5, k=1) at ./gc-8.2.8/extra/../malloc.c:176
#4  0x00007fdbe5621956 in GC_generic_malloc (lb=5, k=1) at ./gc-8.2.8/extra/../malloc.c:255
#5  0x00007fdbe5621cb3 in GC_malloc_kind_global (lb=5, k=1) at ./gc-8.2.8/extra/../malloc.c:339
#6  0x00007fdbe562cee1 in GC_malloc_kind (bytes=5, kind=1) at ./gc-8.2.8/extra/../thread_local_alloc.c:165
#7  0x00007fdbe5621cf9 in GC_malloc (lb=5) at ./gc-8.2.8/extra/../malloc.c:358
#8  0x00007fdbe5f67a2a in rpl_malloc (n=5) at ./build/amd64/lib/malloc.c:43
#9  0x00007fdbe5f63dfb in imalloc (s=5) at ./build/amd64/lib/ialloc.h:51
#10 0x00007fdbe5f67432 in ximalloc (s=5) at ./build/amd64/lib/xmalloc.c:51
#11 0x00007fdbe5f638df in base_name (name=0x7fe4e7d8d534 "/usr/bin/a2ps") at ./build/amd64/lib/basename.c:53
#12 0x00007fdbe5f30d01 in main (argc=1, argv=0x7fe4e7d8d2d8) at ./build/amd64/src/main.c:922

Let me know what do you think.

Thank you,
m.

@l1gi
Copy link
Author

l1gi commented Sep 25, 2024

Also, there are various warnings about implicit declaration of function, incompatible implicit declaration of built-in function and others:

libtool: compile:  /usr/gcc/13/bin/gcc -DHAVE_CONFIG_H -I. -I/scratch/userland-gate/components/a2ps/a2ps-4.15.6/lib -I.. -DDEFAULT_TEXT_DOMAIN=\"a2ps-gnulib\" -m64 -D_REENTRANT -m64 -fPIC -DPIC -g -O0 -ffile-prefix-map=/scratch/userland-gate/components/a2ps=. -MT calloc.lo -MD -MP -MF .deps/calloc.Tpo -c calloc.c  -fPIC -DPIC -o
 .libs/calloc.o                                                                                               
calloc.c: In function 'rpl_calloc':                                                                           
calloc.c:47:18: warning: implicit declaration of function 'calloc' [-Wimplicit-function-declaration]             47 |   void *result = calloc (n, s);                                                                       
      |                  ^~~~~~                                                                               
calloc.c:28:1: note: include '<stdlib.h>' or provide a declaration of 'calloc'                                
   27 | #include "xalloc-oversized.h"                                                                         
  +++ |+#include <stdlib.h>                                                                                   
   28 |                                                                                                       
calloc.c:47:18: warning: incompatible implicit declaration of built-in function 'calloc' [-Wbuiltin-declaration-mismatch]                                                                                                   
   47 |   void *result = calloc (n, s);                                                                       
      |                  ^~~~~~                                                                               
calloc.c:47:18: note: include '<stdlib.h>' or provide a declaration of 'calloc'
...
libtool: compile:  /usr/gcc/13/bin/gcc -DHAVE_CONFIG_H -I. -I/scratch/userland-gate/components/a2ps/a2ps-4.15.
6/lib -I.. -DDEFAULT_TEXT_DOMAIN=\"a2ps-gnulib\" -m64 -D_REENTRANT -m64 -fPIC -DPIC -g -O0 -ffile-prefix-map=/
scratch/userland-gate/components/a2ps=. -MT realloc.lo -MD -MP -MF .deps/realloc.Tpo -c realloc.c  -fPIC -DPIC
 -o .libs/realloc.o                                                                                           
realloc.c: In function 'rpl_realloc':                                                                         
realloc.c:55:18: warning: implicit declaration of function 'realloc' [-Wimplicit-function-declaration]        
   55 |   void *result = realloc (p, n);                                                                      
      |                  ^~~~~~~                                                                              
realloc.c:28:1: note: include '<stdlib.h>' or provide a declaration of 'realloc'                              
   27 | #include "xalloc-oversized.h"                                                                         
  +++ |+#include <stdlib.h>                                                                                   
   28 |                                                                                                       
realloc.c:55:18: warning: incompatible implicit declaration of built-in function 'realloc' [-Wbuiltin-declarat
ion-mismatch]                                                                                                 
   55 |   void *result = realloc (p, n);                                                                      
      |                  ^~~~~~~                                                                              
realloc.c:55:18: note: include '<stdlib.h>' or provide a declaration of 'realloc'
...
libtool: compile:  /usr/gcc/13/bin/gcc -DHAVE_CONFIG_H -I. -I/scratch/userland-gate/components/a2ps/a2ps-4.15.
6/liba2ps -I.. -I.. -I/scratch/userland-gate/components/a2ps/a2ps-4.15.6/liba2ps -I/scratch/userland-gate/comp
onents/a2ps/a2ps-4.15.6/lib -I../lib -DLOCALEDIR=\"/usr/share/locale\" -DSYSCONFFILE=\"/etc/gnu/a2ps.cfg\" -DH
AVE_CONFIG_H -m64 -D_REENTRANT -m64 -fPIC -DPIC -g -O0 -ffile-prefix-map=/scratch/userland-gate/components/a2p
s=. -MT encoding.lo -MD -MP -MF .deps/encoding.Tpo -c encoding.c  -fPIC -DPIC -o .libs/encoding.o
In file included from system.h:22,
                 from a2ps.h:22,
                 from encoding.h:24,
                 from encoding.c:23:
../config.h:2011: warning: "strdup" redefined
 2011 | #define strdup GC_strdup
      | 
In file included from encoding.c:21:
../lib/string.h:1061: note: this is the location of the previous definition
 1061 | #   define strdup rpl_strdup
      | 
depbase=`echo media.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\
...
libtool: compile:  /usr/gcc/13/bin/gcc -DHAVE_CONFIG_H -I. -I/scratch/userland-gate/components/a2ps/a2ps-4.15.
6/liba2ps -I.. -I.. -I/scratch/userland-gate/components/a2ps/a2ps-4.15.6/liba2ps -I/scratch/userland-gate/comp
onents/a2ps/a2ps-4.15.6/lib -I../lib -DLOCALEDIR=\"/usr/share/locale\" -DSYSCONFFILE=\"/etc/gnu/a2ps.cfg\" -DH
AVE_CONFIG_H -m64 -D_REENTRANT -m64 -fPIC -DPIC -g -O0 -ffile-prefix-map=/scratch/userland-gate/components/a2p
s=. -MT message.lo -MD -MP -MF .deps/message.Tpo -c message.c  -fPIC -DPIC -o .libs/message.o
In file included from msg.c:25,
                 from message.c:61:
../config.h:2005: warning: "malloc" redefined
 2005 | #define malloc GC_malloc
      | 
In file included from message.c:23:
../lib/stdlib.h:1116: note: this is the location of the previous definition
 1116 | #   define malloc rpl_malloc
      | 
../config.h:2006: warning: "calloc" redefined
 2006 | #define calloc GC_calloc
      | 
../lib/stdlib.h:839: note: this is the location of the previous definition
  839 | #   define calloc rpl_calloc
      | 
../config.h:2007: warning: "realloc" redefined
 2007 | #define realloc GC_realloc
      | 
../lib/stdlib.h:1791: note: this is the location of the previous definition
 1791 | #   define realloc rpl_realloc
      | 
../config.h:2011: warning: "strdup" redefined
 2011 | #define strdup GC_strdup
      | 
In file included from message.c:25:
../lib/string.h:1061: note: this is the location of the previous definition
 1061 | #   define strdup rpl_strdup
      | 

Don't know how much are they related.

@l1gi
Copy link
Author

l1gi commented Sep 25, 2024

And a2ps configure.log shows no CFLAGS:

BDW_GC_CFLAGS=''
BDW_GC_LIBS='-lgc -lpthread -lrt -ldl'

Could that be the reason?

@ivmai
Copy link
Owner

ivmai commented Sep 25, 2024

GC_register_data_segments () at ./gc-8.2.8/extra/../os_dep.c:2134

This line contains (void)AO_fetch_and_add((volatile AO_t *)result, zero); (try writing to the address) and it triggers SIGSEGV as intended, but it does not call GC_fault_handler. The signal handler is set GC_setup_temporary_fault_handler() but someone (a2ps) changes/removes the handler.

@l1gi I think to find out the root cause (and understand how to fix it), you should figure out where the handler is changed for SIGSEGV.

@l1gi
Copy link
Author

l1gi commented Sep 25, 2024

Added a truss output where you can see the signal related calls and faults.
truss.txt

Looking further into the source code.

@ivmai
Copy link
Owner

ivmai commented Sep 25, 2024

Also, please figure out why does GC_register_main_static_data() return true.

@l1gi
Copy link
Author

l1gi commented Sep 25, 2024

Will look into it.

This is strange also:

8216:       Incurred fault #6, FLTBOUNDS  %pc = 0x7FD1A5622463                                                
8216:         siginfo: SIGSEGV SEGV_MAPERR addr=0x00000020                                                    
8216:       Received signal #11, SIGSEGV [caught]                                                             
8216:         siginfo: SIGSEGV SEGV_MAPERR addr=0x00000020                                                    
8216:   lwp_sigmask(SIG_SETMASK, 0x00000400, 0x00000000, 0x00000000, 0x00000000) = 0xFFBFFEFF [0xFFFFFFFF]    
8216:   sigaction(SIGSEGV, 0x7FD9F65BC480, 0x00000000)  = 0                                                   
8216:   setcontext(0x7FD9F65BC0B0)                                                                            
8216:       Incurred fault #6, FLTBOUNDS  %pc = 0x7FD1A5622463                                                
8216:         siginfo: SIGSEGV SEGV_MAPERR addr=0x00000020                                                    
8216:       Received signal #11, SIGSEGV [default]                                                            
8216:         siginfo: SIGSEGV SEGV_MAPERR addr=0x00000020                     

The address is 0x20. It almost looks there is something wrong with the a2ps code which stores a pointer to 0x20 somewhere and accesses it later.

@l1gi
Copy link
Author

l1gi commented Sep 25, 2024

Hmm, it looks truss will give a result where the processes dies a different way then what gdb/mdb sees. Will focus on your advices.

@l1gi
Copy link
Author

l1gi commented Sep 26, 2024

GC_register_main_static_data

Well, in include/private/gcconfig.h I read:

# ifdef SOLARIS                                                                                               
#   define OS_TYPE "SOLARIS"
...
#   define DYNAMIC_LOADING                                                                                    
...
# endif /* SOLARIS */                                                                                         

And in dyn_load.c:

#if !defined(HAVE_REGISTER_MAIN_STATIC_DATA) && defined(DYNAMIC_LOADING)                                      
  /* Do we need to separately register the main static data segment? */                                       
  GC_INNER GC_bool GC_register_main_static_data(void)                                                         
  {                                                                                                           
    return TRUE;                                                                                              
  }                                                                                                           
#endif /* HAVE_REGISTER_MAIN_STATIC_DATA */

But I don't see dyn_load.c to be even compiled in the build process (also on Linux). Could you explain it to me, please?

@ivmai
Copy link
Owner

ivmai commented Sep 26, 2024

dyn_load.c is included by extra/gc.c

@ivmai
Copy link
Owner

ivmai commented Sep 26, 2024

If !defined(HAVE_REGISTER_MAIN_STATIC_DATA)

I thought it should defined somewhere above in the file. Could you please check why not on Solaris?

@l1gi
Copy link
Author

l1gi commented Oct 10, 2024

To be honest, I do not see any SEGV manipulating code in a2ps, but there could be some generic handlers reset which I am not aware of.

It seems the definition of HAVE_REGISTER_MAIN_STATIC_DATA depends on few other definitions.

 266 #if defined(SCO_ELF) || defined(DGUX) || defined(HURD) || defined(NACL) \                                
 267     || (defined(__ELF__) && (defined(LINUX) || defined(FREEBSD) \                                        
 268                              || defined(NETBSD) || defined(OPENBSD)))                                    
 269                                                                                                          
 270 #ifdef USE_PROC_FOR_LIBRARIES

All modern unixes have USE_PROC_FOR_LIBRARIES defined, but not Solaris. Solaris has its own proc, maybe it is usable. Don't know.

Then there is a chance to have it defined when:

454 #if defined(HAVE_DL_ITERATE_PHDR)

Also, Solaris does not have HAVE_DL_ITERATE_PHDR defined.

Then there is a win section and darwin section. So it finally falls back to define the variant returning TRUE.

To be honest, I am not keen on Solaris dynamic linker, but I have many people around who could help. Also, I am not sure for how long no one tried to build bdw-gc on Solaris, so the platform configuration may be outdated.

Would you recommend me where to look and what to try to move forward, please? What about to try to set USE_PROC_FOR_LIBRARIES and try if that is going to work the same way the rest of the ELF supporting platforms do?

Thank you,
m.

@ivmai
Copy link
Owner

ivmai commented Oct 15, 2024

It seems that GC_setup_temporary_fault_handler does not work properly when linked with a2ps.

Hmm, I think I was wrong with the advice. When you are debugging in gdb, before 'r' command you should type:
handle SIGSEGV pass noprint
I mean gdb should ignore SIGSEGV signals because this signal is used by libgc (same for SIGXCPU, SIGPWR).
Issue solved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants