Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGABRT on launch, no backtrace #18

Open
dsd opened this issue Sep 7, 2018 · 8 comments
Open

SIGABRT on launch, no backtrace #18

dsd opened this issue Sep 7, 2018 · 8 comments

Comments

@dsd
Copy link

dsd commented Sep 7, 2018

Thanks for removing the libfolly dependency that caused me trouble before.

Looking again now, I'm trying to get it working on Debian with systemd-239. The build and install went OK. On first launnch of oomd_bin it segfaulted (with no other error that I could see) and some tracing with gdb indicated that it was because I need to put a config file in place, so I added /etc/oomd.json

{
    "cgroups": [
        {
            "target": "system.slice",
            "oomdetector": "default",
            "oomkiller": "default"
        }
    ],
    "version": "0.2.0"
}

With that in place, it aborts with no logged error. Looking in the source code and with gdb I decided that it was because the memory controller was not active. I set DefaultMemoryAccounting=yes in /etc/systemd.conf and rebooted. (if that's correct, maybe you can add it to the readme?)

Now when I run it I get either this:

# oomd_bin -v
[../Main.cpp:112] oomd running with conf_file=/etc/oomd.json dry=0 verbose=1
[../Config.cpp:119] target_=/sys/fs/cgroup/system.slice
[../../oomd/util/Fs.h:119] Unable to open /etc/oomd_tunables.override
[../shared/Tunables.cpp:32] OOMD_INTERVAL=5
[../shared/Tunables.cpp:32] OOMD_VERBOSE_INTERVAL=300
[../shared/Tunables.cpp:32] OOMD_POST_KILL_DELAY=15
[../shared/Tunables.cpp:32] OOMD_THRESHOLD=60
[../shared/Tunables.cpp:32] OOMD_HIGH_THRESHOLD=80
[../shared/Tunables.cpp:32] OOMD_HIGH_THRESHOLD_DURATION=10
[../shared/Tunables.cpp:32] OOMD_LARGER_THAN=50
[../shared/Tunables.cpp:32] OOMD_GROWTH_ABOVE=80
[../shared/Tunables.cpp:32] OOMD_AVERAGE_SIZE_DECAY=4
[../shared/Tunables.cpp:32] OOMD_FAST_FALL_RATIO=0.85
[../shared/Tunables.cpp:32] OOMD_MIN_SWAP_PCT=15
[../shared/Tunables.cpp:32] OOMD_FBTAX2_WORKLOAD_THRESHOLD=0
Aborted (core dumped)

and for some reason gdb can't figure out the backtrace beyond __GI_abort.

Any idea what I'm doing wrong?

@danobi
Copy link
Contributor

danobi commented Sep 11, 2018

Was there a message or line number that came with the abort? I may have to rethink the async logging a bit, as log messages may not be making it out of the queue in time before the crash.

@dsd
Copy link
Author

dsd commented Sep 13, 2018

No, there wasn't anything printed apart from what was pasted.

@danobi
Copy link
Contributor

danobi commented Sep 17, 2018

Sorry about the spotty response time. I'm going to commit a few patches that enable inline logging (ie not async), so we can debug this better. I think we're crashing before the logging thread can flush everything.

@danobi
Copy link
Contributor

danobi commented Sep 17, 2018

BTW I could not reproduce on my machine, but that doesn't surprise me. (Grumble grumble something about early stage projects)

@danobi
Copy link
Contributor

danobi commented Sep 17, 2018

#21

@danobi
Copy link
Contributor

danobi commented Sep 17, 2018

Could you update your build to fe8363f and then try:

$ rm -rf build
$ CPPFLAGS=-DINLINE_LOGGING meson build
$ ninja -C build

and get it to crash again?

I think we only explicitly generate SIGABRTs in the codebase. There should always be a log message before a SIGABRT. Hopefully this patch gets us some more info.

@danobi
Copy link
Contributor

danobi commented Sep 18, 2018

I changed my mind on the compile option. It's an env var now (bfd75af):

INLINE_LOGGING=1 sudo -E ./oomd_bin <blah>

@jprvita
Copy link

jprvita commented Oct 2, 2018

Hello! I work with @dsd and will be following up on his previous comments here. Thanks for the inline logging fix, I can now see error messages when oomd fails and trace back to the failing code.

The first error we were getting was "Unable to open /sys/fs/cgroup/system.slice/cgroup.subtree_control, which happens because on Debian systems the cgroups2 hierarchy is mounted on /sys/fs/cgroup/unified instead of /sys/fs/cgroup. I was able to work around the problem by changing the target in /etc/oomd.json to unified/system.slice, but you may want to make this a bit more generic. One idea would be to detect at run-time where the cgroups2 hierarchy is mounted.

Another problem I ran into was "FATAL: cgroup memory controller not enabled on /sys/fs/cgroup/unified/system.slice", because all cgroup controllers were bound to the cgroups v1 hierarchy mounted by default on Debian systems. Passing cgroup_no_v1=all to the kernel command line and then manually binding the memory controller to the cgroups2 hierarchy worked around that problem.

Finally, most generic purpose distros do not enable CONFIG_MEMCG_SWAP_ENABLE since it increases memory consumption, so I oomd failed with "Unable to open /sys/fs/cgroup/unified/system.slice/memory.swap.current". Enabling it at runtime with swapaccount=1 avoids the problem, and I now have oomd running.

It would be great to have these points fixed to have oomd more compatible with general-purpose distros. Also, it would be really nice to have more documentation on how to use it in such setups, for example, how to specify an overall threshold of memory pressure for the target cgroup (lets say, user.slice) above which any process in that cgroup should be killed -- unless I have missed it is only possible to specify thresholds per subgroup of the target cgroup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants