Skip to content
Eystein Måløy Stenberg edited this page Sep 23, 2016 · 9 revisions

Troubleshoot CFEngine

Please follow the steps below before submitting a bug.

cf-agent appears to hang

It is important to determine what is hanging, i.e. whether it is CFEngine itself or something that CFEngine is interacting with. Run the program with -v and -d flags to see if it has gone into an infinite loop, or if it is waiting for something.

Common causes of hanging processes:

  • DBM database corruption: try to delete *.lmdb files in /var/cfengine
  • Command processes that do not properly close their file descriptiors of child processes. Try running the commands with a shell enabled (use_shell => "yes") and use </dev/null >/dev/null to close the descriptors.

CFEngine generates a segmentation fault

Segfaults in CFEngine may be caused by the incorrect build environment or by bugs in CFEngine or the libraries it uses.

  • Install GDB. ** use --args option for GDB to pass options to the component ** for all components, add --verbose option. ** for daemons, add --no-fork option. For example:
% gdb --args ./cf-agent/.libs/lt-cf-agent -KI -f POLICYFILE

Clients time out as remote cf-serverd is overloaded or waits for resources

The symptom here is that CFEngine clients (e.g. cf-agent) does not get a timely response from a remote cf-serverd, e.g. when asking for new policy. You would see messages similar to the following on the CFEngine clients:

Failed to establish TLS connection: underlying network error (Connection reset by peer)
No suitable server responded to hail

If this happens on many of the clients, it is likely due to cf-serverd not being able to handle incoming connections fast enough so they start to pile up.

Sometimes cf-serverd is seen to use a lot of CPU time or memory, but it might also be using close to zero CPU. In these cases it is important to understand why cf-serverd is not able to handle the connections fast enough.

To see where the threads of cf-serverd is running at a given time, the following commands can be used

gdb -batch -p $(pgrep cf-serverd) -ex 'info threads' > info_threads.txt
gdb -batch -p $(pgrep cf-serverd) -ex 'thread apply all bt' > backtrace.txt
gdb -batch -p $(pgrep cf-serverd) -ex 'thread apply all bt full' > backtrace-full.txt

With this debugging information you could see if the process is spending time waiting for DBM files, or executes a hot part of the code.

Memory leak

  • Install Valgrind.
  • Run the leaking CFEngine component inside valgrind ** for all components, add --verbose option. ** for daemons, ad --no-fork option.

For example:

valgrind --leak-check=full \
  /var/cfengine/bin/cf-serverd --no-fork 2>/root/valgrind-cf-serverd &
  • If you are debugging a daemon, let it run for such a long time that you are confident that the consumed memory is a bug (remember that valgrind also consumes memory).
  • Send SIGINT to the valgrind process (prefixed memcheck)
# ps -e|grep mem
 2194 pts/0    00:00:24 memcheck-x86-li
# kill -SIGINT 2194

After a successful memory trace has been obtained in /root/valgrind-*.txt, check the end of the trace to verify that at least 10-20MB are lost. Otherwise, rerun the tracing for a longer period of time to gather enough data.

Lots of cf-agents are piling up in the process table

Probably CFEngine is getting stuck on the long task. Kill the existing processes, and run cf-agent -v to see what's going on.

Promises are not evaluated on second run

This is not a bug, but a feature. Have a look at [["Locks" section|https://cfengine.com/manuals/cf3-reference#When-and-where-are-promises-made_003f]] in Reference Manual.

Clone this wiki locally