Skip to content

ossobv/pycoredump

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python wrapper around GDB to examine core dumps

Deadlocks in C programs are a pain to debug. But if you know your way around GDB, you may get more out of a core dump than you think.

Deadlock example

For the following example, we'll create a core dump of a program that intentionally deadlocks: examples/deadlock

That example application spawns eight threads of which five threads use locks and two of those lock each other -- because of a locking inversion.

The application has an alarm set, so it will generate SIGABRT core file after a few seconds. That's nice because now we have something to debug.

Go into the examples/ dir and type make deadlock.core. It will self-test the deadlock application and then create the file deadlock.core using gdb.

$ cd examples/
$ make deadlock.core
cc -Wall -g -O0   -c -o deadlock.o deadlock.c
cc -Wall -g -O0 -pthread  deadlock.o   -o deadlock
./deadlock nolock
...

If all went well, you now have a deadlock.core file:

$ ls -l deadlock.core
-rw-rw-r-- 1 walter walter 17448904 sep  1 21:04 deadlock.core

Manual GDB debugging

If you open the core file with GDB, you should manually be able to find out which threads lock each other; but it's tedious work.

For example:

$ gdb ./deadlock deadlock.core
(gdb) info threads
  Id   Target Id         Frame
  9    Thread 0x2aaaaaaef040 (LWP 27444) 0x00002aaaaaf24cc9 in __GI_raise
    (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
  8    Thread 0x2aaaab4b3700 (LWP 27448) __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  7    Thread 0x2aaaab6b4700 (LWP 27449) __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  6    Thread 0x2aaaab8b5700 (LWP 27450) 0x00002aaaaafaef3d in nanosleep
    () at ../sysdeps/unix/syscall-template.S:81
  5    Thread 0x2aaaabab6700 (LWP 27451) __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  4    Thread 0x2aaaabcb7700 (LWP 27452) __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  3    Thread 0x2aaaabeb8700 (LWP 27453) 0x00002aaaaafaef3d in nanosleep
    () at ../sysdeps/unix/syscall-template.S:81
  2    Thread 0x2aaaac0b9700 (LWP 27454) 0x00002aaaaafaef3d in nanosleep
    () at ../sysdeps/unix/syscall-template.S:81
* 1    Thread 0x2aaaac2ba700 (LWP 27455) __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

Okay, 9 threads. And now?

  1. Check only the threads that are waiting for a lock: those in __lll_lock_wait.
  2. For those threads, check which locks they're waiting for, and who is holding those locks.

Somewhat like this:

(gdb) thread 8
[Switching to thread 8 (Thread 0x2aaaab4b3700 (LWP 27448))]
#0  __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
    135 in ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
(gdb) frame 2
#2  0x00002aaaaacda480 in __GI___pthread_mutex_lock (
      mutex=0x602120 <speciallock>)
    at ../nptl/pthread_mutex_lock.c:79
    79 ../nptl/pthread_mutex_lock.c: No such file or directory.
(gdb) print *(pthread_mutex_t*)mutex
$1 = {__data = {__lock = 2, __count = 0, __owner = 27452,
      __nusers = 1, __kind = 0, __spins = 0, __elision = 0,
      __list = {__prev = 0x0, __next = 0x0}},
      __size = "\002...", '\000' <repeats 26 times>, __align = 2}

There, the __owner shows that the mutex that thread 8 is waiting is waiting for, is (light weight) process id (LWP) 27452, which corresponds to thread 4.

Repeat this step for all threads that are waiting for a lock.

Tedious, huh?

Automated using pycoredump

In the same example directory, now type make debug-deadlock. It will call GDB from the pycoredump library instead.

$ make debug-deadlock
PYTHONPATH=`pwd`/../ python debug-deadlock.py \
          ./deadlock ./deadlock.core
-- all waiting threads --
<GdbThread(thno=8, thid=0x2aaaab4b3700, procid=27448, func=__lll_lock_wait)>
    waits for <GdbThread(thno=4, thid=0x2aaaabcb7700, procid=27452, func=__lll_lock_wait)>
<GdbThread(thno=7, thid=0x2aaaab6b4700, procid=27449, func=__lll_lock_wait)>
    waits for <GdbThread(thno=8, thid=0x2aaaab4b3700, procid=27448, func=__lll_lock_wait)>
<GdbThread(thno=5, thid=0x2aaaabab6700, procid=27451, func=__lll_lock_wait)>
    waits for <GdbThread(thno=8, thid=0x2aaaab4b3700, procid=27448, func=__lll_lock_wait)>
<GdbThread(thno=4, thid=0x2aaaabcb7700, procid=27452, func=__lll_lock_wait)>
    waits for <GdbThread(thno=8, thid=0x2aaaab4b3700, procid=27448, func=__lll_lock_wait)>
<GdbThread(thno=1, thid=0x2aaaac2ba700, procid=27455, func=__lll_lock_wait)>
    waits for <GdbThread(thno=8, thid=0x2aaaab4b3700, procid=27448, func=__lll_lock_wait)>

Nice, only the threads that are waiting for a lock; instead of a listing of all the threads. And as a bonus, directly a reference to the threads they're waiting for.

And it gets better, they can be filtered by relevant threads only: thread 8 and thread 4. And because we get the backtrace info directly, we can see that one of those is indeed the culprit, the one with func=inverted in frame 3.

-- relevant threads --
<GdbThread(thno=8, thid=0x2aaaab4b3700, procid=27448, func=__lll_lock_wait)>
<GdbBacktrace(
 <GdbFrame(no=0, func=__lll_lock_wait, file=../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135>
 <GdbFrame(no=1, func=_L_lock_909, file=/lib/x86_64-linux-gnu/libpthread.so.0>
 <GdbFrame(no=2, func=__GI___pthread_mutex_lock, file=../nptl/pthread_mutex_lock.c:79>
 <GdbFrame(no=3, func=normal, file=deadlock.c:24>
 <GdbFrame(no=4, func=start_thread, file=pthread_create.c:312>
 <GdbFrame(no=5, func=clone, file=../sysdeps/unix/sysv/linux/x86_64/clone.S:111>
)>
    waits for <GdbThread(thno=4, thid=0x2aaaabcb7700, procid=27452, func=__lll_lock_wait)>

<GdbThread(thno=4, thid=0x2aaaabcb7700, procid=27452, func=__lll_lock_wait)>
<GdbBacktrace(
 <GdbFrame(no=0, func=__lll_lock_wait, file=../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135>
 <GdbFrame(no=1, func=_L_lock_909, file=/lib/x86_64-linux-gnu/libpthread.so.0>
 <GdbFrame(no=2, func=__GI___pthread_mutex_lock, file=../nptl/pthread_mutex_lock.c:79>
 <GdbFrame(no=3, func=inverted, file=deadlock.c:36>
 <GdbFrame(no=4, func=start_thread, file=pthread_create.c:312>
 <GdbFrame(no=5, func=clone, file=../sysdeps/unix/sysv/linux/x86_64/clone.S:111>
)>
    waits for <GdbThread(thno=8, thid=0x2aaaab4b3700, procid=27448, func=__lll_lock_wait)>

Better, yes?

Here it clearly shows that they're waiting on each other: a classic example of a deadlock.

That's generated by just the 30 odd lines from debug-deadlock.py using the pycoredump library.

Or, for another sample invocation, try this:

sudo env PYTHONPATH=$(pwd) python examples/debug-deadlock.py \
  /usr/sbin/asterisk \
  /var/spool/asterisk/core.ua4-ams-2018-11-29T16:09:42+0100

Conclusion

Obviously the pycoredump library could use a lot of extra features, and cleanup, but this simple example demonstrates how it can make your deadlock debugging life easier.

I've used the manual technique to find the cause of a deadlock in the Asterisk PBX software. That software supports a thread-debugging mode, but that's so slow that it's unusable in production systems. Luckily, much of the needed info is there in the regular core dump; you just have to find it.

And now, finding it may have become a little bit simpler.

About

Python wrapper around GDB to examine core dumps.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published