Skip to content

Latest commit

 

History

History
171 lines (132 loc) · 5.98 KB

HACKING.adoc

File metadata and controls

171 lines (132 loc) · 5.98 KB

Tips on hacking the OCaml runtime system

Linking a test program with the debug runtime

Suppose you have a self-contained OCaml program test.ml that crashes, you are working on a development repository (not an installed version of your system). You probably want to run test.ml against the "debug runtime", which in particular activates the CAMLassert debug assertions.

If you want to use the bytecode compiler:

# build the runtime
make runtime -j

# compile as usual
./ocamlc.opt -nostdlib -I stdlib test.ml -o test

# run with the debug runtime (ocamlrund)
./runtime/ocamlrund ./test

If you want to use the native compiler:

# build the native runtime
make runtimeopt -j

# compile with "-runtime-variant d"
./ocamlopt.opt -nostdlib -I stdlib -runtime-variant d -I runtime test.ml -o test

./test

Note that the debug runtime does extra work, so it may slow down your program — and sometimes make the issue you are trying to debug vanish.

GC messages

The GC can send various messages about what it is doing, enabled with the "v" option of OCAMLRUNPARAM. Various options are more or less documented in https://ocaml.org/manual/runtime.html#s:ocamlrun-options. You can enable all printing with

OCAMLRUNPARAM="v=0xffffffff" ./test

Note: caml_gc_log can be used to show log messages prefixed with the thread number, and it corresponds to the more precise setting v=0x800.

Heap verification

Another useful OCAMLRUNPARAM setting is V=1, which enables additional sanity checks on the heap during major GC cycles.

OCAMLRUNPARAM="V=1" ./test

Getting stack traces after assertion failures (Linux)

The output of a crashing OCaml program may end up like this:

[03] file domain.c; line 404 ### Assertion failed: domain_state->young_start == NULL
Aborted (core dumped)

The message "core dumped" indicates that some debugging information was kept on the disk.

On Linux, systemd-enabled systems tend to use a systemd tool (of course!) to store core dumps.

# ask your system how core dumps are handled.
$ cat /proc/sys/kernel/core_pattern
|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h

If your system is also using systemd-coredump, then the command coredumpctl dump will show you information about the last "core dump".

$ $ coredumpctl dump
           PID: 678260 (Domain0)
           UID: 1000 (gasche)
           GID: 1000 (gasche)
        Signal: 6 (ABRT)
     Timestamp: Fri 2022-02-25 09:30:32 CET (4min 30s ago)
  Command Line: ./test
    Executable: /home/gasche/Prog/ocaml/github-max_domains/test
 Control Group: [...]
                [...]
     Disk Size: 133.0K
       Message: Process 678260 (Domain0) of user 1000 dumped core.

                Stack trace of thread 678266:
                #0  0x00007f60ee4842a2 raise (libc.so.6 + 0x3d2a2)
                #1  0x00007f60ee46d8a4 abort (libc.so.6 + 0x268a4)
                #2  0x0000000000475022 n/a (/home/gasche/Prog/ocaml/github-max_domains/test + 0x75022)
Refusing to dump core to tty (use shell redirection or specify --output).

You can get a full backtrace using echo bt | coredumpctl debug:

$ echo bt | coredumpctl debug
[...]
Core was generated by `./test'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f60ee4842a2 in raise () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7f60d77fe640 (LWP 678266))]
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.33-20.fc34.x86_64
(gdb) #0  0x00007f60ee4842a2 in raise () from /lib64/libc.so.6
#1  0x00007f60ee46d8a4 in abort () from /lib64/libc.so.6
#2  0x0000000000475022 in caml_failed_assert (
    expr=expr@entry=0x488498 "domain_state->young_start == NULL",
    file_os=file_os@entry=0x488218 "domain.c", line=line@entry=404) at misc.c:56
#3  0x0000000000461831 in caml_free_minor_heap () at domain.c:404
#4  0x000000000046237b in caml_reallocate_minor_heap (wsize=wsize@entry=786432) at domain.c:469
#5  0x0000000000474404 in caml_set_minor_heap_size (wsize=wsize@entry=786432) at minor_gc.c:130
#6  0x00000000004696b3 in caml_gc_set (v=<optimized out>) at gc_ctrl.c:222
#7  <signal handler called>
#8  0x000000000042a3b2 in camlTest__set_gc_280 () at test.ml:17
#9  0x000000000042a818 in camlTest__fun_529 () at test.ml:39
#10 0x000000000044947a in camlStdlib__Domain__body_694 () at domain.ml:204
#11 <signal handler called>
#12 0x000000000045fe38 in caml_callback_exn (closure=<optimized out>, arg=<optimized out>, arg@entry=1) at callback.c:169
#13 0x0000000000460369 in caml_callback (closure=<optimized out>, arg=arg@entry=1) at callback.c:253
#14 0x0000000000461f6a in domain_thread_func (v=0x7ffdd7357bb0) at domain.c:1034
#15 0x00007f60ee61f299 in start_thread () from /lib64/libpthread.so.0
#16 0x00007f60ee547353 in clone () from /lib64/libc.so.6
(gdb) quit

Using rr for deterministic replay debugging

There is a lot of information on how to use rr to debug the OCaml runtime on the OCaml Multicore wiki: https://github.com/ocaml-multicore/ocaml-multicore/wiki/Debugging-the-OCaml-Multicore-runtime#rr.

TODO: it would be nice to migrate some information here.

Compiling with sanitizers

ThreadSanitizer

You can instrument the runtime to detect data races in it, by adding -fsanitize=thread to both CFLAGS and LDFLAGS. It will however make the compiler build rather slow.

Note that this is different from passing --enable-tsan to the configure script. --enable-tsan not only instruments the runtime, but also the code generated by ocamlopt. In addition, it suppresses a number of race reports from the runtime to avoid clogging the output of user programs, and it gives to the TSan runtime a slightly altered version of the real memory accesses (see #12114).

Other sanitizers

TODO: I would be curious to know!

(For the brave there are some scripts in ../tools/ci/inria/sanitizers/script, but you probably don’t want to run them directly, in particular they will git clean -xfd, destroying changed/uncommitted files in your development repository!)