The CrashManager is a new coredump handler and crash manager for linux system which brings more features to existing coredump handling solutions.
Highlights:
- Coredumps are pre-processed while parsing the coredump stream from the kernel to generate IDs for easy categorization of the crashes (CrashID -> based on Instruction Pointer and Return Address, VectorID -> Return address only)
- Coredumps are context aware, crashes in containers can be easily identified (ContextID) and filtered with automatic container name labelling for LXC containers
- Automatic management of coredump database size limits defined in the configuration file
- The coredump output is one file, a compressed tarball containing the coredump and the context files (including binary) defined in the configuration file as context information per process. See default configuration file: https://github.com/anpopa/crashmanager/blob/master/config/crashmanager.conf.in
- The coredump output is a standard compressed tarball no extra tooling is required to extract the information
- Dynamic content support dumping binary files as well. This is very useful to embed data like screenshots, textures, databases, etc.
- Support for cascade crashing. When a process crash analyses require peer process coredump for debugging (eg. generate a server coredump when a client is crashing with ipc timeout)
- The component is using libarchive to create the output so the compression algorithm can be easily changed at build time
- A crash journal is created and maintained on target with information like the history of crashes, file transfer states, removed crashdumps, etc.
- The component provides a new tool crashinfo which can be used on target to extract journal information and/or in SDK to easily extract crash information (obtaining the backtrace is as easy as
crashinfo --bt <crashdump_archive.cdh.tar.gz>
Executing crashinfo
without arguments on target list the crash history with context information about the crashes:
% crashinfo
Idx Procname Timestamp CrashID VectorID Context PID TRS REM FILE
1 crashtest 07:52:01 2019-08-29 4747714566DE87D2 4BD0D5866D3FA284 debian 4356 1 0 crashtest.4356.1567065121.cdh.tar.gz
2 crashtest 08:24:08 2019-08-29 D170728BBE14D94F D1E0A2C7F0051A0A debian 4556 1 0 crashtest.4556.1567067048.cdh.tar.gz
3 crashtest 08:24:30 2019-08-29 D170728BBE14D94F D1E0A2C7F0051A0A debian 4576 1 0 crashtest.4576.1567067070.cdh.tar.gz
4 crashtest 08:23:02 2019-08-30 D170728BBE14D94F D1E0A2C7F0051A0A debian 6184 1 0 crashtest.6184.1567153382.cdh.tar.gz
To see the context information about a crash just use the info
argument and the crash archive name (or path):
% crashinfo --info crashtest.6184.1567153382.cdh.tar.gz
[crashdata]
ProcessName = crashtest
ProcessThread = crashtest
ProcessExe = /usr/local/bin/crashtest
LifecycleState = running
CrashTimestamp = 1567153382
ProcessID = 6184
ResidentID = 6184
CrashSignal = 11
CrashID = D170728BBE14D94F
VectorID = D1E0A2C7F0051A0A
ContextID = F18BDD746CC08FED
ContextName = debian
IP = 0x0000558ce3f57576
RA = 0x00007fffbc00df30
IPFileOffset = 0x0000000000001576
RAFileOffset = 0x000000000002409b
IPModuleName = /usr/local/bin/crashtest
RAModuleName = /usr/lib/x86_64-linux-gnu/libc-2.28.so
CoredumpSize = 380928
The content of the archive is dynamic and we can see the content with files
and print the content the content of any file with print
:
% crashinfo --files crashtest.6184.1567153382.cdh.tar.gz
root.proc.6184.cmdline
root.proc.6184.fd
root.proc.6184.ns
root.proc.6184.cgroup
root.proc.6184.stack
root.proc.6184.environ
root.proc.6184.status
root.proc.6184.sched
root.proc.6184.maps
root.proc.6184.stat
root.proc.6184.smaps
core.crashtest.6184.0000
info.crashdata
% crashinfo --print root.proc.6184.fd crashtest.6184.1567153382.cdh.tar.gz
lrwx------ 1 1000 1000 64 0 -> /dev/pts/0
lrwx------ 1 1000 1000 64 1 -> /dev/pts/0
lrwx------ 1 1000 1000 64 2 -> /dev/pts/0
Because now the crashdump is embedding the coredump and the context information we can print the backtrace very easy in SDK:
% crashinfo --bt crashtest.6184.1567153382.cdh.tar.gz
Extracting coredump with size 380928 ... Done.
New file name: /tmp/.HXMK7Z/crashtest.6184.1567153382.core
Reading symbols from /usr/local/bin/crashtest...done.
[New LWP 6184]
Core was generated by `crashtest -t2'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 main (argc=2, argv=0x7fffbc00e018) at ../testing/crashtest/crashtest.c:146
146 *(int*)0 = 2;
#0 main (argc=2, argv=0x7fffbc00e018) at ../testing/crashtest/crashtest.c:146
The build system is meson so make sure you have meson installed:
% cd crashmanager
% meson setup build
% cd build
% ninja
% ninja install
For complete list of configuation option use meson configure:
% meson configure