Suppose you have a self-contained OCaml program test.ml
that
crashes, you are working on a development repository (not an installed
version of your system). You probably want to run test.ml
against
the "debug runtime", which in particular activates the CAMLassert
debug assertions.
If you want to use the bytecode compiler:
# build the runtime make runtime -j # compile as usual ./ocamlc.opt -nostdlib -I stdlib test.ml -o test # run with the debug runtime (ocamlrund) ./runtime/ocamlrund ./test
If you want to use the native compiler:
# build the native runtime make runtimeopt -j # compile with "-runtime-variant d" ./ocamlopt.opt -nostdlib -I stdlib -runtime-variant d -I runtime test.ml -o test ./test
Note that the debug runtime does extra work, so it may slow down your program — and sometimes make the issue you are trying to debug vanish.
The GC can send various messages about what it is doing, enabled with the "v" option of OCAMLRUNPARAM. Various options are more or less documented in https://ocaml.org/manual/runtime.html#s:ocamlrun-options. You can enable all printing with
OCAMLRUNPARAM="v=0xffffffff" ./test
Note: caml_gc_log
can be used to show log messages prefixed with the
thread number, and it corresponds to the more precise setting
v=0x800
.
Another useful OCAMLRUNPARAM setting is V=1
, which enables
additional sanity checks on the heap during major GC cycles.
OCAMLRUNPARAM="V=1" ./test
The output of a crashing OCaml program may end up like this:
[03] file domain.c; line 404 ### Assertion failed: domain_state->young_start == NULL Aborted (core dumped)
The message "core dumped" indicates that some debugging information was kept on the disk.
On Linux, systemd-enabled systems tend to use a systemd tool (of course!) to store core dumps.
# ask your system how core dumps are handled. $ cat /proc/sys/kernel/core_pattern |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
If your system is also using systemd-coredump
, then the command
coredumpctl dump
will show you information about the last "core
dump".
$ $ coredumpctl dump PID: 678260 (Domain0) UID: 1000 (gasche) GID: 1000 (gasche) Signal: 6 (ABRT) Timestamp: Fri 2022-02-25 09:30:32 CET (4min 30s ago) Command Line: ./test Executable: /home/gasche/Prog/ocaml/github-max_domains/test Control Group: [...] [...] Disk Size: 133.0K Message: Process 678260 (Domain0) of user 1000 dumped core. Stack trace of thread 678266: #0 0x00007f60ee4842a2 raise (libc.so.6 + 0x3d2a2) #1 0x00007f60ee46d8a4 abort (libc.so.6 + 0x268a4) #2 0x0000000000475022 n/a (/home/gasche/Prog/ocaml/github-max_domains/test + 0x75022) Refusing to dump core to tty (use shell redirection or specify --output).
You can get a full backtrace using echo bt | coredumpctl debug
:
$ echo bt | coredumpctl debug [...] Core was generated by `./test'. Program terminated with signal SIGABRT, Aborted. #0 0x00007f60ee4842a2 in raise () from /lib64/libc.so.6 [Current thread is 1 (Thread 0x7f60d77fe640 (LWP 678266))] Missing separate debuginfos, use: dnf debuginfo-install glibc-2.33-20.fc34.x86_64 (gdb) #0 0x00007f60ee4842a2 in raise () from /lib64/libc.so.6 #1 0x00007f60ee46d8a4 in abort () from /lib64/libc.so.6 #2 0x0000000000475022 in caml_failed_assert ( expr=expr@entry=0x488498 "domain_state->young_start == NULL", file_os=file_os@entry=0x488218 "domain.c", line=line@entry=404) at misc.c:56 #3 0x0000000000461831 in caml_free_minor_heap () at domain.c:404 #4 0x000000000046237b in caml_reallocate_minor_heap (wsize=wsize@entry=786432) at domain.c:469 #5 0x0000000000474404 in caml_set_minor_heap_size (wsize=wsize@entry=786432) at minor_gc.c:130 #6 0x00000000004696b3 in caml_gc_set (v=<optimized out>) at gc_ctrl.c:222 #7 <signal handler called> #8 0x000000000042a3b2 in camlTest__set_gc_280 () at test.ml:17 #9 0x000000000042a818 in camlTest__fun_529 () at test.ml:39 #10 0x000000000044947a in camlStdlib__Domain__body_694 () at domain.ml:204 #11 <signal handler called> #12 0x000000000045fe38 in caml_callback_exn (closure=<optimized out>, arg=<optimized out>, arg@entry=1) at callback.c:169 #13 0x0000000000460369 in caml_callback (closure=<optimized out>, arg=arg@entry=1) at callback.c:253 #14 0x0000000000461f6a in domain_thread_func (v=0x7ffdd7357bb0) at domain.c:1034 #15 0x00007f60ee61f299 in start_thread () from /lib64/libpthread.so.0 #16 0x00007f60ee547353 in clone () from /lib64/libc.so.6 (gdb) quit
There is a lot of information on how to use rr
to debug the OCaml
runtime on the OCaml Multicore wiki:
https://github.com/ocaml-multicore/ocaml-multicore/wiki/Debugging-the-OCaml-Multicore-runtime#rr.
TODO: it would be nice to migrate some information here.
You can instrument the runtime to detect data races in it, by adding
-fsanitize=thread
to both CFLAGS
and LDFLAGS
. It will however make the
compiler build rather slow.
Note that this is different from passing --enable-tsan
to the configure
script. --enable-tsan
not only instruments the runtime, but also the code
generated by ocamlopt. In addition, it suppresses a number of race reports from
the runtime to avoid clogging the output of user programs, and it gives to the
TSan runtime a slightly altered version of the real memory accesses (see
#12114).
TODO: I would be curious to know!
(For the brave there are some scripts in
../tools/ci/inria/sanitizers/script, but you probably don’t
want to run them directly, in particular they will git clean -xfd
,
destroying changed/uncommitted files in your development repository!)