-
Notifications
You must be signed in to change notification settings - Fork 185
Troubleshooting
Please follow the steps below before submitting a bug.
It is important to determine what is hanging, i.e. whether it is CFEngine itself
or something that CFEngine is interacting with. Run the program with -v
and
-d
flags to see if it has gone into an infinite loop, or if it is waiting
for something.
Common causes of hanging processes:
- DBM database corruption: try to delete
*.lmdb
files in/var/cfengine
- Command processes that do not properly close their file descriptiors of child processes. Try running the commands with a shell enabled (
use_shell => "yes"
) and use</dev/null >/dev/null
to close the descriptors.
Segfaults in CFEngine may be caused by the incorrect build environment or by bugs in CFEngine or the libraries it uses.
- Install GDB.
** use
--args
option for GDB to pass options to the component ** for all components, add--verbose
option. ** for daemons, add--no-fork
option. For example:
% gdb --args ./cf-agent/.libs/lt-cf-agent -KI -f POLICYFILE
The symptom here is that CFEngine clients (e.g. cf-agent) does not get a timely response from a remote cf-serverd, e.g. when asking for new policy. You would see messages similar to the following on the CFEngine clients:
Failed to establish TLS connection: underlying network error (Connection reset by peer)
No suitable server responded to hail
If this happens on many of the clients, it is likely due to cf-serverd not being able to handle incoming connections fast enough so they start to pile up.
Sometimes cf-serverd is seen to use a lot of CPU time or memory, but it might also be using close to zero CPU. In these cases it is important to understand why cf-serverd is not able to handle the connections fast enough.
To see where the threads of cf-serverd is running at a given time, the following commands can be used
gdb -batch -p $(pgrep cf-serverd) -ex 'info threads' > info_threads.txt
gdb -batch -p $(pgrep cf-serverd) -ex 'thread apply all bt' > backtrace.txt
gdb -batch -p $(pgrep cf-serverd) -ex 'thread apply all bt full' > backtrace-full.txt
With this debugging information you could see if the process is spending time waiting for DBM files, or executes a hot part of the code.
- Install Valgrind.
- Run the leaking CFEngine component inside valgrind
** for all components, add
--verbose
option. ** for daemons, ad--no-fork
option.
For example:
valgrind --leak-check=full \
/var/cfengine/bin/cf-serverd --no-fork 2>/root/valgrind-cf-serverd &
- If you are debugging a daemon, let it run for such a long time that you are confident that the consumed memory is a bug (remember that valgrind also consumes memory).
- Send SIGINT to the valgrind process (prefixed memcheck)
# ps -e|grep mem
2194 pts/0 00:00:24 memcheck-x86-li
# kill -SIGINT 2194
After a successful memory trace has been obtained in /root/valgrind-*.txt
,
check the end of the trace to verify that at least 10-20MB are lost. Otherwise,
rerun the tracing for a longer period of time to gather enough data.
Probably CFEngine is getting stuck on the long task. Kill the existing
processes, and run cf-agent -v
to see what's going on.
This is not a bug, but a feature. Have a look at [["Locks" section|https://cfengine.com/manuals/cf3-reference#When-and-where-are-promises-made_003f]] in Reference Manual.