Python wrapper around GDB to examine core dumps

Deadlocks in C programs are a pain to debug. But if you know your way around GDB, you may get more out of a core dump than you think.

Deadlock example

For the following example, we'll create a core dump of a program that intentionally deadlocks: examples/deadlock

That example application spawns eight threads of which five threads use locks and two of those lock each other -- because of a locking inversion.

The application has an alarm set, so it will generate SIGABRT core file after a few seconds. That's nice because now we have something to debug.

Go into the examples/ dir and type make deadlock.core. It will self-test the deadlock application and then create the file deadlock.core using gdb.

$ cd examples/
$ make deadlock.core
cc -Wall -g -O0   -c -o deadlock.o deadlock.c
cc -Wall -g -O0 -pthread  deadlock.o   -o deadlock
./deadlock nolock
...

If all went well, you now have a deadlock.core file:

$ ls -l deadlock.core
-rw-rw-r-- 1 walter walter 17448904 sep  1 21:04 deadlock.core

Manual GDB debugging

If you open the core file with GDB, you should manually be able to find out which threads lock each other; but it's tedious work.

For example:

$ gdb ./deadlock deadlock.core
(gdb) info threads
  Id   Target Id         Frame
  9    Thread 0x2aaaaaaef040 (LWP 27444) 0x00002aaaaaf24cc9 in __GI_raise
    (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
  8    Thread 0x2aaaab4b3700 (LWP 27448) __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  7    Thread 0x2aaaab6b4700 (LWP 27449) __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  6    Thread 0x2aaaab8b5700 (LWP 27450) 0x00002aaaaafaef3d in nanosleep
    () at ../sysdeps/unix/syscall-template.S:81
  5    Thread 0x2aaaabab6700 (LWP 27451) __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  4    Thread 0x2aaaabcb7700 (LWP 27452) __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  3    Thread 0x2aaaabeb8700 (LWP 27453) 0x00002aaaaafaef3d in nanosleep
    () at ../sysdeps/unix/syscall-template.S:81
  2    Thread 0x2aaaac0b9700 (LWP 27454) 0x00002aaaaafaef3d in nanosleep
    () at ../sysdeps/unix/syscall-template.S:81
* 1    Thread 0x2aaaac2ba700 (LWP 27455) __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

Okay, 9 threads. And now?

Check only the threads that are waiting for a lock: those in __lll_lock_wait.
For those threads, check which locks they're waiting for, and who is holding those locks.

Somewhat like this:

(gdb) thread 8
[Switching to thread 8 (Thread 0x2aaaab4b3700 (LWP 27448))]
#0  __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
    135 in ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
(gdb) frame 2
#2  0x00002aaaaacda480 in __GI___pthread_mutex_lock (
      mutex=0x602120 <speciallock>)
    at ../nptl/pthread_mutex_lock.c:79
    79 ../nptl/pthread_mutex_lock.c: No such file or directory.
(gdb) print *(pthread_mutex_t*)mutex
$1 = {__data = {__lock = 2, __count = 0, __owner = 27452,
      __nusers = 1, __kind = 0, __spins = 0, __elision = 0,
      __list = {__prev = 0x0, __next = 0x0}},
      __size = "\002...", '\000' <repeats 26 times>, __align = 2}

There, the __owner shows that the mutex that thread 8 is waiting is waiting for, is (light weight) process id (LWP) 27452, which corresponds to thread 4.

Repeat this step for all threads that are waiting for a lock.

Tedious, huh?

Automated using pycoredump

In the same example directory, now type make debug-deadlock. It will call GDB from the pycoredump library instead.

$ make debug-deadlock
PYTHONPATH=`pwd`/../ python debug-deadlock.py \
          ./deadlock ./deadlock.core
-- all waiting threads --
<GdbThread(thno=8, thid=0x2aaaab4b3700, procid=27448, func=__lll_lock_wait)>
    waits for <GdbThread(thno=4, thid=0x2aaaabcb7700, procid=27452, func=__lll_lock_wait)>
<GdbThread(thno=7, thid=0x2aaaab6b4700, procid=27449, func=__lll_lock_wait)>
    waits for <GdbThread(thno=8, thid=0x2aaaab4b3700, procid=27448, func=__lll_lock_wait)>
<GdbThread(thno=5, thid=0x2aaaabab6700, procid=27451, func=__lll_lock_wait)>
    waits for <GdbThread(thno=8, thid=0x2aaaab4b3700, procid=27448, func=__lll_lock_wait)>
<GdbThread(thno=4, thid=0x2aaaabcb7700, procid=27452, func=__lll_lock_wait)>
    waits for <GdbThread(thno=8, thid=0x2aaaab4b3700, procid=27448, func=__lll_lock_wait)>
<GdbThread(thno=1, thid=0x2aaaac2ba700, procid=27455, func=__lll_lock_wait)>
    waits for <GdbThread(thno=8, thid=0x2aaaab4b3700, procid=27448, func=__lll_lock_wait)>

Nice, only the threads that are waiting for a lock; instead of a listing of all the threads. And as a bonus, directly a reference to the threads they're waiting for.

And it gets better, they can be filtered by relevant threads only: thread 8 and thread 4. And because we get the backtrace info directly, we can see that one of those is indeed the culprit, the one with func=inverted in frame 3.

-- relevant threads --
<GdbThread(thno=8, thid=0x2aaaab4b3700, procid=27448, func=__lll_lock_wait)>
<GdbBacktrace(
 <GdbFrame(no=0, func=__lll_lock_wait, file=../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135>
 <GdbFrame(no=1, func=_L_lock_909, file=/lib/x86_64-linux-gnu/libpthread.so.0>
 <GdbFrame(no=2, func=__GI___pthread_mutex_lock, file=../nptl/pthread_mutex_lock.c:79>
 <GdbFrame(no=3, func=normal, file=deadlock.c:24>
 <GdbFrame(no=4, func=start_thread, file=pthread_create.c:312>
 <GdbFrame(no=5, func=clone, file=../sysdeps/unix/sysv/linux/x86_64/clone.S:111>
)>
    waits for <GdbThread(thno=4, thid=0x2aaaabcb7700, procid=27452, func=__lll_lock_wait)>

<GdbThread(thno=4, thid=0x2aaaabcb7700, procid=27452, func=__lll_lock_wait)>
<GdbBacktrace(
 <GdbFrame(no=0, func=__lll_lock_wait, file=../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135>
 <GdbFrame(no=1, func=_L_lock_909, file=/lib/x86_64-linux-gnu/libpthread.so.0>
 <GdbFrame(no=2, func=__GI___pthread_mutex_lock, file=../nptl/pthread_mutex_lock.c:79>
 <GdbFrame(no=3, func=inverted, file=deadlock.c:36>
 <GdbFrame(no=4, func=start_thread, file=pthread_create.c:312>
 <GdbFrame(no=5, func=clone, file=../sysdeps/unix/sysv/linux/x86_64/clone.S:111>
)>
    waits for <GdbThread(thno=8, thid=0x2aaaab4b3700, procid=27448, func=__lll_lock_wait)>

Better, yes?

Here it clearly shows that they're waiting on each other: a classic example of a deadlock.

That's generated by just the 30 odd lines from debug-deadlock.py using the pycoredump library.

Or, for another sample invocation, try this:

sudo env PYTHONPATH=$(pwd) python examples/debug-deadlock.py \
  /usr/sbin/asterisk \
  /var/spool/asterisk/core.ua4-ams-2018-11-29T16:09:42+0100

Conclusion

Obviously the pycoredump library could use a lot of extra features, and cleanup, but this simple example demonstrates how it can make your deadlock debugging life easier.

I've used the manual technique to find the cause of a deadlock in the Asterisk PBX software. That software supports a thread-debugging mode, but that's so slow that it's unusable in production systems. Luckily, much of the needed info is there in the regular core dump; you just have to find it.

And now, finding it may have become a little bit simpler.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
examples		examples
pycoredump		pycoredump
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.rst		README.rst

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python wrapper around GDB to examine core dumps

Deadlock example

Manual GDB debugging

Automated using pycoredump

Conclusion

About

Releases

Packages

Languages

License

ossobv/pycoredump

Folders and files

Latest commit

History

Repository files navigation

Python wrapper around GDB to examine core dumps

Deadlock example

Manual GDB debugging

Automated using pycoredump

Conclusion

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages